Lead Data Engineer

Apply Now

Company: My3Tech

Location: San Francisco, CA 94112

Description:

Responsibilities

Development Tasks:
  • Collect metrics based on user interactions.
  • Visualize data for business teams.
  • Develop and redesign data pipelines using Kafka streams.
  • Implement solutions using Spring Boot Java and Databricks Spark streaming.


Leadership Duties:
  • Lead the measurement processes from requirements gathering to production delivery.
  • Collaborate with other team leads, business partners, and product managers.
  • Balance between hands-on engineering (50%) and team leadership (50%).


Collaboration Structure:
  • Onsite: Lead role (this resource)
    • Nearshore: Senior developer.
    • Offshore: Data engineer role.


Lead Data Engineer - Job Description

Required Skills & Experience:
  • Hands-on code mindset with deep understanding in technologies / skillset and an ability to understand larger picture.
  • Sound knowledge to understand Architectural Patterns, best practices and Non-Functional Requirements
  • Overall, 8-10 years of experience in heavy volume data processing, data platform, data lake, big data, data warehouse, or equivalent.
  • 5+ years of experience with strong proficiency in Python and Spark (must-have).
  • 3+ years of hands-on experience in ETL workflows using Spark and Python.
  • 4+ years of experience with large-scale data loads, feature extraction, and data processing pipelines in different modes - near real time, batch, realtime.
  • Solid understanding of data quality, data accuracy concepts and practices.
  • 3+ years of solid experience in building and deploying ML models in a production setup. Ability to quickly adapt and take care of data preprocessing, feature engineering, model engineering as needed.
  • Preferred: Experience working with Python deep learning libraries like any or more than one of these - PyTorch, Tensorflow, Keras or equivalent.
  • Preferred: Prior experience working with LLMs, transformers. Must be able to work through all phases of the model development as needed.
  • Experience integrating with various data stores, including:
  • SQL/NoSQL databases
  • In-memory stores like Redis
  • Data lakes (e.g., Delta Lake)
  • Experience with Kafka streams, producers & consumers.
  • Required: Experience with Databricks or similar data lake / data platform.
  • Required: Java and Spring Boot experience with respect to data processing - near real time, batch based.
  • Familiarity with notebook-based environments such as Jupyter Notebook.
  • Adaptability: Must be open to learning new technologies and approaches.
  • Initiative: Ability to take ownership of tasks, learn independently, and innovate.
  • With technology landscape changing rapidly, ability and willingness to learn new technologies as needed and produce results on job.

    Preferred Skills:
  • Ability to pivot from conventional approaches and develop creative solutions.


Required Skills : Python and PySpark. Kafka and Kafka streams. MySQL and MySQL Heat. Azure Delta Lake. ETL processes. Kafka integrations using Spring Boot Java. Data streaming with Spark.

Basic Qualification :

Additional Skills :

Background Check : Yes

Drug Screen : No

Notes :
Selling points for candidate :
Project Verification Info :
Exclusive to Apex :No
Face to face interview required :No
Candidate must be local :No
Candidate must be authorized to work without sponsorship ::No
Interview times set : :No
Type of project :
Master Job Title :
Branch Code :

Similar Jobs