SRE - Automation Engineer
Apply NowCompany: CyberThink Inc.
Location: Austin, TX 78745
Description:
Job Description:
As an SRE Automation Engineer, you will design and implement scalable, resilient, and intelligent automation solutions to enhance operational efficiency. This role requires a strong systems engineering background and an automation-first mindset to drive efficiency, reduce manual toil, and optimize large-scale cloud environments. You will automate infrastructure, integrate tools via APIs, enhance observability, and implement AIOps-driven solutions. This position is ideal for individuals passionate about problem-solving, AI/ML in operations, and driving innovation in automation.
Key Responsibilities:
Required Skills, Experiences, Education, and Competencies:
The hourly range for roles of this nature are $40.00 to $80.00/hr. Rates are heavily dependent on skills, experience, location, and industry.
cyberThink is an Equal Opportunity Employer.
As an SRE Automation Engineer, you will design and implement scalable, resilient, and intelligent automation solutions to enhance operational efficiency. This role requires a strong systems engineering background and an automation-first mindset to drive efficiency, reduce manual toil, and optimize large-scale cloud environments. You will automate infrastructure, integrate tools via APIs, enhance observability, and implement AIOps-driven solutions. This position is ideal for individuals passionate about problem-solving, AI/ML in operations, and driving innovation in automation.
Key Responsibilities:
- Develop Python-based automation solutions for on-prem (Pivotal Cloud Foundry, Windows & Linux VMs) and cloud infrastructure on GCP and Kubernetes.
- Continuously identify and implement improvements to enhance operational excellence.
- Build scalable and proactive automation solutions.
- Implement and manage configuration automation using Ansible (preferred).
- Integrate various tools and services via APIs and client libraries for seamless interoperability.
- Enhance deployment reliability through automated chaos strategies, failover mechanisms, and self-healing infrastructure.
- Develop proactive monitoring and alerting solutions using Splunk, GCP Operations Suite, Grafana, and Prometheus.
- Conduct deep root cause analysis (RCA) and incident management for system failures, developing automation to prevent recurrence.
- Optimize system resilience and performance tuning for mission-critical applications.
- Apply AI/ML techniques to automation workflows, enhancing anomaly detection, predictive scaling, and intelligent alerting.
Required Skills, Experiences, Education, and Competencies:
- Strong background in systems engineering with a focus on automation and reliability.
- Proficiency in Python (intermediate to expert level) for developing automation and integrations.
- Hands-on expertise with Kubernetes and cloud platforms (GCP or any major cloud).
- Experience integrating tools and platforms via APIs and client libraries.
- Deep understanding of monitoring and alerting using Splunk, GCP Operations Suite, Grafana, and Prometheus.
- Ability to operate in high-stakes environments where reliability and uptime are critical.
- Strong problem-solving skills to navigate uncertainty and complex challenges.
- Experience with Ansible for infrastructure automation.
- Prior experience working in mission-critical teams managing large-scale, high-availability systems.
- Enthusiasm for AI/ML and AIOps, with a passion for applying them in automation and operations.
The hourly range for roles of this nature are $40.00 to $80.00/hr. Rates are heavily dependent on skills, experience, location, and industry.
cyberThink is an Equal Opportunity Employer.