Sr. Site Reliability Engineer

Apply Now

Company: Charles Schwab

Location: Southlake, TX 76092

Description:

Your Opportunity

At Schwab, you are empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us "challenge the status quo" and transform the finance industry together. As a member of the CET SAvE organization, you will join the Production Operations team for Schwab's Mobile Application while driving the adoption of Site Reliability Engineering (SRE) best practices. In this critical role, you will shape automation, tooling, observability, and reliability strategies across engineering teams to enhance service health and performance.

What You'll Do
  • Lead and Optimize Reliability - Drive tactical and strategic initiatives to improve service health, performance, and availability for Schwab's Mobile Application.
  • Champion SRE Best Practices - Implement key operational methodologies, including SLIs, SLOs, error budgets, blameless postmortems, and capacity planning.
  • Enhance Observability & Automation - Develop and improve monitoring, telemetry, and alerting systems to proactively detect and resolve issues, reducing MTTD and MTTR.
  • Drive Tooling & DevOps Innovation - Design and implement automation solutions that reduce toil, streamline deployments, and improve overall system resilience.
  • Collaborate Cross-Functionally - Partner closely with Mobile Engineering, DevOps, and Infrastructure teams to enhance scalability, security, and reliability.
  • Provide On-Call Support - Participate in an on-call rotation to ensure the reliability of Schwab's Retail Web and Mobile applications.


What you have

Required Qualifications
  • Bachelor of Science or equivalent in Computer Science or a related field.
  • 8+ years of experience in software development and site reliability engineering (SRE), with a strong focus on cloud technologies.
  • 8+ years in DevOps engineering, with expertise in automating production operations and developing self-healing systems.
  • 8+ years hands-on experience with CI/CD tools, logging, observability, and telemetry solutions such as Bitbucket, Bamboo, GitHub, Jenkins, AppDynamics, Splunk, Prometheus, and Grafana.
  • 5+ years of proven ability to implement SRE principles, including SLIs, SLOs, error budgets, monitoring, blameless postmortems, and toil reduction.

Preferred Qualifications
  • Strong proficiency in programming and automation using Python, Java, CloudFormation, or Terraform for Infrastructure-as-Code (IaC) solutions.
  • Familiarity with Cloud Infrastructure platforms (AWS, GCP, and Azure)
  • Deep understanding of Compute, Storage, Networking, Load Balancing, CDN, DNS, and Security stacks in cloud environments.
  • Ability to work independently in a fast-paced, high-impact environment while collaborating effectively across teams.
  • Excellent verbal and written communication skills, with the ability to convey complex technical concepts to both technical and non-technical stakeholders.


"In addition to the salary range, this role is also eligible for bonus or incentive opportunities."

Similar Jobs