System Administrator (GCP/AWS/Azure, PySpark, BigQuery, and Google Airflow)

Apply Now

Company: Macpower Digital Assets Edge

Location: San Jose, CA 95123

Description:

Job Overview: This role involves managing and optimizing Big Data environments (PySpark, BigQuery, Airflow) across Google Cloud, AWS, or Azure platforms, ensuring efficient, secure, and cost-effective operations. Key responsibilities include 24x7 support, data pipeline optimization, automation, and troubleshooting, with a strong emphasis on DevOps, CI/CD, and disaster recovery.

Roles and Responsibilities: (Google Cloud/AWS/Azure, PySpark, BigQuery, and Google Airflow)
  • Participate in 24x7x365 rotational shift support and operations for SAP environments.
  • Serve as a team lead responsible for maintaining the upstream Big Data ecosystem, handling millions of financial transactions daily using PySpark, BigQuery, Dataproc, and Google Airflow.
  • Streamline and optimize existing Big Data systems and pipelines while developing new ones, ensuring efficient and cost-effective performance.
  • Manage the operations team during your designated shift and make necessary changes to the underlying infrastructure.
  • Provide day-to-day support, improve platform functionality using DevOps practices, and collaborate with development teams to enhance database operations.
  • rchitect and optimize data warehouse solutions using BigQuery to enable efficient data storage and retrieval.
  • Install, build, patch, upgrade, and configure Big Data applications.
  • dminister and configure BigQuery environments, including datasets and tables.
  • Ensure data integrity, availability, and security on the BigQuery platform.
  • Implement partitioning and clustering strategies for optimized query performance.
  • Define and enforce access policies for BigQuery datasets.
  • Set up query usage caps and alerts to control costs and prevent overages.
  • Troubleshoot issues in Linux-based systems with strong command-line proficiency.
  • Create and maintain dashboards and reports to monitor key metrics such as cost and performance.
  • Integrate BigQuery with other GCP services like Dataflow, Pub/Sub, and Cloud Storage.
  • Enable BigQuery usage through tools such as Jupyter Notebook, Visual Studio Code, and CLI utilities.
  • Implement data quality checks and validation processes to maintain data accuracy.
  • Manage and monitor data pipelines using Airflow and CI/CD tools like Jenkins and Screwdriver.
  • Collaborate with data analysts and scientists to gather data requirements and translate them into technical implementations.
  • Provide guidance and support to application development teams for database design, deployment, and monitoring.
  • Demonstrate proficiency in Unix/Linux fundamentals, scripting in Shell/Perl/Python, and using Ansible for automation.
  • Contribute to disaster recovery planning and ensure high availability, including backup and restore operations.
  • Experience with geo-redundant databases and Red Hat clustering is a plus.
  • Ensure timely delivery within defined SLAs and project milestones, adhering to best practices for continuous improvement.
  • Coordinate with support teams including DB, Google, PySpark data engineering, and infrastructure.
  • Participate in Incident, Change, Release, and Problem Management processes.


Must Have Skills, Experience:
  • 4-8 years of relevant experience.
  • Strong experience with Big Data technologies including PySpark, BigQuery, and Google Airflow.
  • Hands-on expertise in cloud platforms (Google Cloud, AWS, or Azure) and Linux system troubleshooting.
  • Proficiency in automation and DevOps tools such as Shell/Python scripting, CI/CD processes, and Ansible.

Similar Jobs