Lead Data Engineer

Apply Now

Company: My3Tech

Location: San Francisco, CA 94112

Description:

Responsibilities

Development Tasks:

Collect metrics based on user interactions.
Visualize data for business teams.
Develop and redesign data pipelines using Kafka streams.
Implement solutions using Spring Boot Java and Databricks Spark streaming.

Leadership Duties:

Lead the measurement processes from requirements gathering to production delivery.
Collaborate with other team leads, business partners, and product managers.
Balance between hands-on engineering (50%) and team leadership (50%).

Collaboration Structure:

Onsite: Lead role (this resource)

Nearshore: Senior developer.
Offshore: Data engineer role.

Lead Data Engineer - Job Description

Required Skills & Experience:

Hands-on code mindset with deep understanding in technologies / skillset and an ability to understand larger picture.

Sound knowledge to understand Architectural Patterns, best practices and Non-Functional Requirements

Overall, 8-10 years of experience in heavy volume data processing, data platform, data lake, big data, data warehouse, or equivalent.

5+ years of experience with strong proficiency in Python and Spark (must-have).

3+ years of hands-on experience in ETL workflows using Spark and Python.

4+ years of experience with large-scale data loads, feature extraction, and data processing pipelines in different modes - near real time, batch, realtime.

Solid understanding of data quality, data accuracy concepts and practices.

3+ years of solid experience in building and deploying ML models in a production setup. Ability to quickly adapt and take care of data preprocessing, feature engineering, model engineering as needed.

Preferred: Experience working with Python deep learning libraries like any or more than one of these - PyTorch, Tensorflow, Keras or equivalent.

Preferred: Prior experience working with LLMs, transformers. Must be able to work through all phases of the model development as needed.

Experience integrating with various data stores, including:

SQL/NoSQL databases

In-memory stores like Redis

Data lakes (e.g., Delta Lake)

Experience with Kafka streams, producers & consumers.

Required: Experience with Databricks or similar data lake / data platform.

Required: Java and Spring Boot experience with respect to data processing - near real time, batch based.

Familiarity with notebook-based environments such as Jupyter Notebook.

Adaptability: Must be open to learning new technologies and approaches.

Initiative: Ability to take ownership of tasks, learn independently, and innovate.

With technology landscape changing rapidly, ability and willingness to learn new technologies as needed and produce results on job.

Preferred Skills:

Ability to pivot from conventional approaches and develop creative solutions.

Required Skills : Python and PySpark. Kafka and Kafka streams. MySQL and MySQL Heat. Azure Delta Lake. ETL processes. Kafka integrations using Spring Boot Java. Data streaming with Spark.

Basic Qualification :

Additional Skills :

Background Check : Yes

Drug Screen : No

Notes :
Selling points for candidate :
Project Verification Info :
Exclusive to Apex :No
Face to face interview required :No
Candidate must be local :No
Candidate must be authorized to work without sponsorship ::No
Interview times set : :No
Type of project :
Master Job Title :
Branch Code :

Lead Data Engineer

Similar Jobs