Sr Site Reliability Engineer

Apply Now

Company: The Walt Disney Company

Location: Glendale, CA 91205

Description:

Job Posting Title:
Sr Site Reliability Engineer

Req ID:
10117258

Job Description:

"We Power the Magic!" That's our motto at Disney Experiences (DX). Our team creates world-class immersive digital experiences for the Company's premier vacation brands including Disney's Parks & Resorts worldwide, Disney Cruise Line, Aulani, a Disney Resort & Spa, and Disney Vacation Club.

This team focuses on empowering that storytelling through innovative solutions.

Reduce/Eliminate Guest Impacting Incidents/Outages across the Disney Experiences Portfolios
Allow the product teams to focus on development and enhancement of our Products

Operational duties include, but are not limited to:

Providing 24/7 on-call support as required
Monitoring and optimizing site reliability for optimal performance
Implementing automation to eliminate repetitive tasks
Managing incidents and conducting root cause analysis
Remediating security issues of our services in the cloud
Troubleshooting cloud infrastructure application and network issues
Maintaining comprehensive technical documentation

Engineering duties include:

Provisioning reliable cloud infrastructure services
Architecting active-active multi-region systems
Eliminating single points of failure
Planning capacity and auto-scaling cloud services
Designing, building, and integrating software
Monitoring cloud applications (VMs/Containers/Kubernetes)
Mentoring operations team members in their areas of expertise

Qualities we are looking for:

You like working with clients - you will work with customers/product engineering to gather requirements. You like hearing stories.
You have a passion for improvement - you have passion for improving processes (e.g. through less code, fewer manual steps, fewer systems, improving velocity).
You are law-abiding but agent-of-change - you will advocate compliance with known standards and engage engineers to improve upon processes
You are a team player - you mentor others and contribute support documentation; here, heroes work at enriching the team
You can multitask - you are action oriented, capable of working concurrent projects
You have a developer mindset and are comfortable writing code
With an operations mindset you have some experience in maintaining production systems

Expectations:

Responsible for creating breakdown of tasks to meet project objectives
Responsible for on time ticket and task completion

The growth path will have you:

Responsible for turning strategy into multiple project objectives
Accountable for/teaching other engineers how to create breakdowns of tasks
Accountable for/teaching other engineers how to complete tasks
Responsible for sharing their work/experiences with the greater org

Technical Requirements:

5+ years relevant, progressive experience in the reliability engineering space.
Knowledge of Build/Release skills: Work with product development teams to maintain SDLC pipelines.
Expert monitoring skills: Ensure monitoring tools effectively notify guest-facing issues.
Expert team communication skills: Ensure team understands and approves solutions.
Expert technical fundamentals: Mastery of Unix system administration duties.
Experience in the public cloud: Proficient with launching products on platforms like Google, AWS, Azure, SalesForce, and private clouds.
Experience in Infrastructure as Code (IaC): Adopt an IaC mindset (Terraform, Helm, Chef).

Preferred Technical Qualifications (listed in order importance):

At least one of the following languages: Golang or Python
Container orchestration: Kubernetes, ECS, AppEngine
Building docker images
Alerting and Monitoring: using Appdynamics, Splunk, Grafana, etc
CICD ( with Jenkins or Harness, GitHub or Gitlab )
SDLC Build and Release processes

Additional knowledge of/with:

Languages: NodeJs, Java
AWS Cloud (Fargate, ECS, Lambdas, ApiGateways, EC2, S3, ALB/ELB, Elasticache, EKS, KMS-Secret Manager, VPCs, IAM)
Google Cloud Platform (App Engine, Kubernetes ( Helm/Tiller ), Cloud Functions, Firebase, IAM)
Logging/Monitoring/Alerting (Cloudwatch/Splunk/AppDynamics/Elasticache/Grafana)
Terraform/Atlantis
Rundeck, Chef, Ansible, Vault
MessageQueueing: RabbitMQ, PubSub
Load balancers

Required Education:

Bachelor's degree in computer science, Information Systems or related equivalent experience.

Preferred Education:

Master's degree in computer science, Information Systems or related equivalent experience.

#DISNEYTECH

The hiring range for this position in Glendale, CA/Anaheim, CA is $138,800 - $186,100 per year and in Seattle, WA is $145,400 - $195,000 per year. The base pay actually offered will take into account internal equity and also may vary depending on the candidate's geographic region, job-related knowledge, skills, and experience among other factors. A bonus and/or long-term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits, dependent on the level and position offered.

Job Posting Segment:
Technology & Digital

Job Posting Primary Business:
Tech Delivery, Platforms, & Core Systems

Primary Job Posting Category:
Site/System Reliability Engineer

Employment Type:
Full time

Primary City, State, Region, Postal Code:
Orlando, FL, USA

Alternate City, State, Region, Postal Code:
USA - CA - 1200 Grand Central Ave, USA - CA - Disneyland Service - Bldg 700 Complex, USA - WA - 925 4th Ave

Date Posted:
2025-04-04

Sr Site Reliability Engineer

Similar Jobs