Data Engineer

Apply Now

Company: My3Tech

Location: Suffolk, VA 23434

Description:

Essential Responsibilities:

Build and Integrate Data Pipelines: Design, integrate, and implement batch ETL processes for data from diverse source systems into our Databricks environment, contributing to the expansion and optimization of our cloud-based data lake (Lakehouse).
Data Quality and Integrity: Ensure pipelines meet high standards of data quality and integrity, implementing rigorous validation, cleansing, and enrichment processes on large volumes of banking data.
Maintain historical data for auditability and regulatory compliance (leveraging Delta Lake's ACID features for versioning).
Performance Optimization: Optimize data processing performance on Databricks (e.g. efficient Spark SQL, partitioning techniques) and manage ETL job scheduling and dependencies to meet business SLAs for data timeliness.
Governance and Compliance: Adhere to enterprise data governance policies and implement security best practices for sensitive financial data.
Ensure compliance with banking regulations by enforcing access controls, encryption, and data lineage tracking across pipelines.
Cross-Team Collaboration: Work closely with data architects, analysts, and business stakeholders to gather requirements and translate banking domain needs into scalable data solutions. Collaborate with BI, risk, and data science teams to support analytics and machine learning initiatives with robust data feeds.
Continuous Improvement: Identify and implement improvements (including automating repeatable workflows) to enhance pipeline stability, efficiency, and future scalability. Keep the data platform up-to-date with industry best practices and emerging Databricks features.
Adheres to applicable federal laws, rules, and regulations including those related to Anti-Money Laundering (AML) and the Bank Secrecy Act (BSA).
Other duties as assigned.

Minimum Required Skills & Competencies:

Bachelor's degree in computer science or related field (or equivalent practical experience). 3+ years of experience as a data engineer in complex, large-scale data environments, preferably in the cloud.
Strong hands-on expertise with Databricks and the Apache Spark ecosystem (PySpark, Spark SQL) for building large-scale data pipelines. Experience working with Delta Lake tables and Lakehouse architectural patterns for data management.
Databricks Delta Live Tables (DLT): Experience using Delta Live Tables to build automated, declarative ETL pipelines on Databricks.
Proficient in Python (including PySpark) for data processing tasks. Solid coding skills in SQL for complex querying and data transformation (Scala or Java experience is a plus).
Experience with at least one major cloud platform (Azure preferred) and its data services (e.g., S3, Azure Data Lake Storage, BigQuery). Familiarity with cloud-based ETL tools and infrastructure (e.g., Azure Data Factory, AWS Glue) for scalable storage and processing.
Strong understanding of data modeling and data warehousing concepts, including designing relational schemas and dimensional models (OLTP/OLAP, star schemas, etc.) for analytics.
Experience designing end-to-end data pipeline architectures, including orchestration and workflow scheduling. Familiarity with pipeline orchestration tools (Databricks Jobs, Apache Airflow, or Azure Data Factory) to automate and manage complex workflows.
Hands-on experience implementing data quality checks (unit tests, data validation rules) and monitoring ETL pipelines to ensure the accuracy and consistency of data outputs.
Knowledge of data governance standards and security best practices for managing sensitive data. Understanding of compliance requirements in banking (e.g., encryption, PII handling, auditing) and ability to enforce data access controls and documentation of data lineage.
Experience using version control (Git) and CI/CD pipelines for code deployment. Comfortable with DevOps practices to package, test, and deploy data pipeline code in a controlled, repeatable manner.
Strong problem-solving skills with an ability to troubleshoot complex data issues. Capable of translating business requirements into efficient, reliable ETL solutions and optimizing workflows for performance and cost-efficiency.

Desired Skills & Competencies:

Familiarity with the banking sector's data and processes (e.g. retail banking transactions, investment trading data, fraud detection, risk analytics) is a strong plus.
Understanding financial services terminology or prior experience on finance data projects can help contextualize data engineering work.
Exposure to real-time data streaming and event-driven architectures. Knowledge of Spark Structured Streaming or Kafka for ingesting and processing streaming data alongside batch workflows is a plus.
Experience building data pipelines for regulatory reporting or compliance use cases in finance. Familiarity with ensuring consistency, integrity, and timeliness of data in regulatory pipelines (e.g. for CCAR, AML, or Basel reporting) would set a candidate apart.
Understanding of DataOps techniques (automated testing, monitoring, and CI/CD for data pipelines) or MLOps integration to support machine learning data requirements. Experience with tools and frameworks that improve the automation and reliability of data workflows is a plus.
Relevant industry certifications can be an advantage - for example, Databricks Certified Data Engineer or cloud platform certifications in data engineering. These demonstrate a commitment to staying current with evolving data technologies

Required Skills : Azure experience - 3 yrs of experience Databricks experience- 3yrs years of experience (ideally but if a little less then that is fine!) Spark experience- 3 yrs of experience ETL experience- 3 yrs of experience

Basic Qualification :

Additional Skills :

Contract to perm Hybrid on site

Background Check : Yes

Drug Screen : Yes

Notes :
Selling points for candidate :Contract to perm Hybrid on site
Project Verification Info :
Exclusive to Apex :Yes
Face to face interview required :No
Candidate must be local :No
Candidate must be authorized to work without sponsorship :Yes
Interview times set : :No
Type of project :
Master Job Title :
Branch Code :

Data Engineer

Similar Jobs