SRE - Automation Engineer

Apply Now

Company: CyberThink Inc.

Location: Austin, TX 78745

Description:

Job Description:
As an SRE Automation Engineer, you will design and implement scalable, resilient, and intelligent automation solutions to enhance operational efficiency. This role requires a strong systems engineering background and an automation-first mindset to drive efficiency, reduce manual toil, and optimize large-scale cloud environments. You will automate infrastructure, integrate tools via APIs, enhance observability, and implement AIOps-driven solutions. This position is ideal for individuals passionate about problem-solving, AI/ML in operations, and driving innovation in automation.

Key Responsibilities:
  • Develop Python-based automation solutions for on-prem (Pivotal Cloud Foundry, Windows & Linux VMs) and cloud infrastructure on GCP and Kubernetes.
  • Continuously identify and implement improvements to enhance operational excellence.
  • Build scalable and proactive automation solutions.
  • Implement and manage configuration automation using Ansible (preferred).
  • Integrate various tools and services via APIs and client libraries for seamless interoperability.
  • Enhance deployment reliability through automated chaos strategies, failover mechanisms, and self-healing infrastructure.
  • Develop proactive monitoring and alerting solutions using Splunk, GCP Operations Suite, Grafana, and Prometheus.
  • Conduct deep root cause analysis (RCA) and incident management for system failures, developing automation to prevent recurrence.
  • Optimize system resilience and performance tuning for mission-critical applications.
  • Apply AI/ML techniques to automation workflows, enhancing anomaly detection, predictive scaling, and intelligent alerting.

Required Skills, Experiences, Education, and Competencies:
  • Strong background in systems engineering with a focus on automation and reliability.
  • Proficiency in Python (intermediate to expert level) for developing automation and integrations.
  • Hands-on expertise with Kubernetes and cloud platforms (GCP or any major cloud).
  • Experience integrating tools and platforms via APIs and client libraries.
  • Deep understanding of monitoring and alerting using Splunk, GCP Operations Suite, Grafana, and Prometheus.
  • Ability to operate in high-stakes environments where reliability and uptime are critical.
  • Strong problem-solving skills to navigate uncertainty and complex challenges.
  • Experience with Ansible for infrastructure automation.
  • Prior experience working in mission-critical teams managing large-scale, high-availability systems.
  • Enthusiasm for AI/ML and AIOps, with a passion for applying them in automation and operations.


The hourly range for roles of this nature are $40.00 to $80.00/hr. Rates are heavily dependent on skills, experience, location, and industry.

cyberThink is an Equal Opportunity Employer.

Similar Jobs