System Administrator (GCP/AWS/Azure, PySpark, BigQuery, and Google Airflow)
Apply NowCompany: Macpower Digital Assets Edge
Location: San Jose, CA 95123
Description:
Job Overview: This role involves managing and optimizing Big Data environments (PySpark, BigQuery, Airflow) across Google Cloud, AWS, or Azure platforms, ensuring efficient, secure, and cost-effective operations. Key responsibilities include 24x7 support, data pipeline optimization, automation, and troubleshooting, with a strong emphasis on DevOps, CI/CD, and disaster recovery.
Roles and Responsibilities: (Google Cloud/AWS/Azure, PySpark, BigQuery, and Google Airflow)
Must Have Skills, Experience:
Roles and Responsibilities: (Google Cloud/AWS/Azure, PySpark, BigQuery, and Google Airflow)
- Participate in 24x7x365 rotational shift support and operations for SAP environments.
- Serve as a team lead responsible for maintaining the upstream Big Data ecosystem, handling millions of financial transactions daily using PySpark, BigQuery, Dataproc, and Google Airflow.
- Streamline and optimize existing Big Data systems and pipelines while developing new ones, ensuring efficient and cost-effective performance.
- Manage the operations team during your designated shift and make necessary changes to the underlying infrastructure.
- Provide day-to-day support, improve platform functionality using DevOps practices, and collaborate with development teams to enhance database operations.
- rchitect and optimize data warehouse solutions using BigQuery to enable efficient data storage and retrieval.
- Install, build, patch, upgrade, and configure Big Data applications.
- dminister and configure BigQuery environments, including datasets and tables.
- Ensure data integrity, availability, and security on the BigQuery platform.
- Implement partitioning and clustering strategies for optimized query performance.
- Define and enforce access policies for BigQuery datasets.
- Set up query usage caps and alerts to control costs and prevent overages.
- Troubleshoot issues in Linux-based systems with strong command-line proficiency.
- Create and maintain dashboards and reports to monitor key metrics such as cost and performance.
- Integrate BigQuery with other GCP services like Dataflow, Pub/Sub, and Cloud Storage.
- Enable BigQuery usage through tools such as Jupyter Notebook, Visual Studio Code, and CLI utilities.
- Implement data quality checks and validation processes to maintain data accuracy.
- Manage and monitor data pipelines using Airflow and CI/CD tools like Jenkins and Screwdriver.
- Collaborate with data analysts and scientists to gather data requirements and translate them into technical implementations.
- Provide guidance and support to application development teams for database design, deployment, and monitoring.
- Demonstrate proficiency in Unix/Linux fundamentals, scripting in Shell/Perl/Python, and using Ansible for automation.
- Contribute to disaster recovery planning and ensure high availability, including backup and restore operations.
- Experience with geo-redundant databases and Red Hat clustering is a plus.
- Ensure timely delivery within defined SLAs and project milestones, adhering to best practices for continuous improvement.
- Coordinate with support teams including DB, Google, PySpark data engineering, and infrastructure.
- Participate in Incident, Change, Release, and Problem Management processes.
Must Have Skills, Experience:
- 4-8 years of relevant experience.
- Strong experience with Big Data technologies including PySpark, BigQuery, and Google Airflow.
- Hands-on expertise in cloud platforms (Google Cloud, AWS, or Azure) and Linux system troubleshooting.
- Proficiency in automation and DevOps tools such as Shell/Python scripting, CI/CD processes, and Ansible.