Python Data Platform Engineer
Apply NowCompany: Compunnel Software Group
Location: Montreal, QC H1A 0A1
Description:
Description:
Job Duties:
As a Python Data Platform Engineer, you will be a member of the C3 Data Warehouse team within the Controls Engineering, Measurement and Analytics (CEMA) department, with a focus on building our next-gen data platform used for sourcing and storing data from different technology systems across the firm into a centralized data platform that empowers various reporting and analytics solutions for the Technology Risk functions within Morgan Stanley. In this role you will be primarily responsible for contributing to the development of a unified data pipeline framework written in Python utilizing technologies such as Airflow, DBT, Spark and Snowflake. You will also be responsible for contributing to the integration of this framework with existing internal platforms for data quality, data cataloging, data discovery, incident logging, and metric generation. You will be working closely with data warehousing leads, data analysts, ETL developers, infrastructure engineers, and data analytics teams to facilitate the implementation of this data platform and data pipeline framework.
KEY RESPONSIBILITIES:
" To develop various components in Python of our unified data pipeline framework.
" To contribute towards the establishment of best practices for the optimal and efficient usage of Airflow, DBT and Snowflake.
" To assist with the testing and deployment of our data pipeline framework utilizing standard testing frameworks and CI/CD tooling.
" To monitor the performance of queries and data loads and perform tuning as necessary.
" To provide assistance and guidance during the QA & UAT phases to quickly confirm the validity of potential issues and to determine the root cause and best resolution of verified issues.
Minimum Skills Required:
" Bachelor's degree in Computer Science, Software Engineering, Information Technology, or related field required.
" At least 7 years of experience in data development and solutions in highly complex data environments with large data volumes.
" At least 7 years of SQL / PLSQL experience with the ability to write ad-hoc and complex queries to perform data analysis.
" At least 5 years of experience developing data pipelines and data warehousing solutions using Python and libraries such as Pandas, NumPy, PySpark, etc.
" At least 3 years of experience developing solutions in a hybrid data environment (on-Prem and Cloud)
" At least 3 years of experience developing Airflow DAGs to orchestrate data pipelines that utilize branching, dynamic DAG / task generation, and error handling.
" Hands on experience with developing data pipelines for structured, semi-structured, and unstructured data and experience integrating with their supporting stores (e.g. RDBMS, NoSQL DBs, Document DBs, Log Files, etc.)
" Hands on experience with Snowflake a must.
" Hands on experience with Apache Spark a must.
" Hands on experience with DBT preferred.
" Experience with performance tuning SQL queries, Spark job, and stored procedures.
" An understanding of E-R data models (conceptual, logical, and physical).
" Understanding of advanced data warehouse concepts (Factless Fact Tables, Temporal \ Bi-Temporal models, etc.) a plus.
" Strong analytical skills, including a thorough understanding of how to interpret customer business requirements and translate them into technical designs and solutions.
" Strong communication skills both verbal and written. Capable of collaborating effectively across a variety of IT and Business groups, across regions, roles and able to interact effectively with all levels.
" Self-starter. Proven ability to manage multiple, concurrent projects with minimal supervision. Can manage a complex ever changing priority list and resolve conflicts to competing priorities.
" Strong problem-solving skills. Ability to identify where focus is needed and bring clarity to business objectives, requirements, and priorities.
Education: Bachelors Degree
Job Duties:
As a Python Data Platform Engineer, you will be a member of the C3 Data Warehouse team within the Controls Engineering, Measurement and Analytics (CEMA) department, with a focus on building our next-gen data platform used for sourcing and storing data from different technology systems across the firm into a centralized data platform that empowers various reporting and analytics solutions for the Technology Risk functions within Morgan Stanley. In this role you will be primarily responsible for contributing to the development of a unified data pipeline framework written in Python utilizing technologies such as Airflow, DBT, Spark and Snowflake. You will also be responsible for contributing to the integration of this framework with existing internal platforms for data quality, data cataloging, data discovery, incident logging, and metric generation. You will be working closely with data warehousing leads, data analysts, ETL developers, infrastructure engineers, and data analytics teams to facilitate the implementation of this data platform and data pipeline framework.
KEY RESPONSIBILITIES:
" To develop various components in Python of our unified data pipeline framework.
" To contribute towards the establishment of best practices for the optimal and efficient usage of Airflow, DBT and Snowflake.
" To assist with the testing and deployment of our data pipeline framework utilizing standard testing frameworks and CI/CD tooling.
" To monitor the performance of queries and data loads and perform tuning as necessary.
" To provide assistance and guidance during the QA & UAT phases to quickly confirm the validity of potential issues and to determine the root cause and best resolution of verified issues.
Minimum Skills Required:
" Bachelor's degree in Computer Science, Software Engineering, Information Technology, or related field required.
" At least 7 years of experience in data development and solutions in highly complex data environments with large data volumes.
" At least 7 years of SQL / PLSQL experience with the ability to write ad-hoc and complex queries to perform data analysis.
" At least 5 years of experience developing data pipelines and data warehousing solutions using Python and libraries such as Pandas, NumPy, PySpark, etc.
" At least 3 years of experience developing solutions in a hybrid data environment (on-Prem and Cloud)
" At least 3 years of experience developing Airflow DAGs to orchestrate data pipelines that utilize branching, dynamic DAG / task generation, and error handling.
" Hands on experience with developing data pipelines for structured, semi-structured, and unstructured data and experience integrating with their supporting stores (e.g. RDBMS, NoSQL DBs, Document DBs, Log Files, etc.)
" Hands on experience with Snowflake a must.
" Hands on experience with Apache Spark a must.
" Hands on experience with DBT preferred.
" Experience with performance tuning SQL queries, Spark job, and stored procedures.
" An understanding of E-R data models (conceptual, logical, and physical).
" Understanding of advanced data warehouse concepts (Factless Fact Tables, Temporal \ Bi-Temporal models, etc.) a plus.
" Strong analytical skills, including a thorough understanding of how to interpret customer business requirements and translate them into technical designs and solutions.
" Strong communication skills both verbal and written. Capable of collaborating effectively across a variety of IT and Business groups, across regions, roles and able to interact effectively with all levels.
" Self-starter. Proven ability to manage multiple, concurrent projects with minimal supervision. Can manage a complex ever changing priority list and resolve conflicts to competing priorities.
" Strong problem-solving skills. Ability to identify where focus is needed and bring clarity to business objectives, requirements, and priorities.
Education: Bachelors Degree