SRE Lead

Apply Now

Company: Cloud BC Labs

Location: Dallas, TX 75217

Description:

Role- SRE Lead

Location-Dallas TX - Onsite

Term : W2

JD:

Required Skills & Experience

As a Senior SRE Lead, you will lead the implementation, optimization, and maintenance of production systems at the customer site. You will work closely with cross-functional teams, including development, operations, and business stakeholders, to ensure high availability, performance, and resilience of applications and infrastructure. Your expertise in automation, monitoring, incident management, and cloud cost optimization will be critical to driving operational excellence and financial efficiency.

Key Responsibilities:

1. System Reliability and Performance
Design, implement, and maintain highly available, scalable, and resilient systems.
Monitor system health and performance using tools like Splunk, Dynatrace, Prometheus, Grafana, or similar platforms.
Define and track Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets to measure system reliability.
Perform root cause analysis (RCA) for incidents and implement preventive measures to avoid recurrence.

2. Automation and Tooling
Automate repetitive tasks such as deployments, scaling, and monitoring using scripting languages (e.g., Python, Bash, PowerShell).
Develop and maintain Infrastructure as Code (IaC) using tools like Terraform, Ansible, or CloudFormation.
Build and optimize CI/CD pipelines to streamline application delivery processes.

3. Incident Management and On-Call Support
Lead incident response efforts, coordinating with internal and customer teams to resolve issues quickly.
Participate in an on-call rotation to provide 24x7 support for critical systems.
Reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) through proactive monitoring and automation.

4. FinOps and Cost Optimization
Implement FinOps practices to manage and optimize cloud and infrastructure costs effectively.
Analyze and monitor cloud spending using tools like AWS Cost Explorer, Azure Cost Management, or third-party solutions (e.g., Cloud Health, Spot.io).
Identify opportunities to reduce costs through resource optimization, reserved instances, spot instances, and auto-scaling policies.
Collaborate with finance and engineering teams to establish budgets, forecasts, and cost allocation strategies.
Educate and train teams on cost-aware development and operational practices.

5. Collaboration and Leadership
Act as the primary technical point of contact at the customer site, fostering strong relationships with stakeholders.
Mentor junior engineers and guide them in adopting SRE best practices, including cost optimization.
Collaborate with development teams to embed observability, scalability, reliability, and cost efficiency into the software development lifecycle (SDLC).

6. Compliance and Security
Ensure compliance with security standards and regulatory requirements (e.g., GDPR, HIPAA, SOC 2).
Implement and enforce security best practices across systems and processes.
Conduct regular audits and vulnerability assessments to maintain a secure environment.

Required Qualifications:

Experience:
9+ years of experience in IT operations, DevOps, or Site Reliability Engineering roles.
Proven experience leading SRE initiatives in customer-facing or on-site roles.
Hands-on experience with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
Strong understanding of distributed systems, microservices architecture, and serverless computing.
Experience with cloud cost optimization and FinOps practices

Technical Skills:
Proficiency in automation tools
Expertise in monitoring and observability tools (e.g., Splunk, Dynatrace, Prometheus, Grafana).
Experience with configuration management tools (e.g., Ansible, Puppet, Chef).
Knowledge of scripting and programming languages (e.g., Python, Bash, Go).
Familiarity with database technologies (e.g., MySQL, PostgreSQL, MongoDB).
Hands-on experience with cloud cost management tools (Azure Cost Management, or Cloud Health).

Similar Jobs