Sr Site Reliability Engineer

Apply Now

Company: The Walt Disney Company

Location: Glendale, CA 91205

Description:

Job Posting Title:
Sr Site Reliability Engineer

Req ID:
10117258

Job Description:

"We Power the Magic!" That's our motto at Disney Experiences (DX). Our team creates world-class immersive digital experiences for the Company's premier vacation brands including Disney's Parks & Resorts worldwide, Disney Cruise Line, Aulani, a Disney Resort & Spa, and Disney Vacation Club.

This team focuses on empowering that storytelling through innovative solutions.
  • Reduce/Eliminate Guest Impacting Incidents/Outages across the Disney Experiences Portfolios
  • Allow the product teams to focus on development and enhancement of our Products


Operational duties include, but are not limited to:
  • Providing 24/7 on-call support as required
  • Monitoring and optimizing site reliability for optimal performance
  • Implementing automation to eliminate repetitive tasks
  • Managing incidents and conducting root cause analysis
  • Remediating security issues of our services in the cloud
  • Troubleshooting cloud infrastructure application and network issues
  • Maintaining comprehensive technical documentation


Engineering duties include:
  • Provisioning reliable cloud infrastructure services
  • Architecting active-active multi-region systems
  • Eliminating single points of failure
  • Planning capacity and auto-scaling cloud services
  • Designing, building, and integrating software
  • Monitoring cloud applications (VMs/Containers/Kubernetes)
  • Mentoring operations team members in their areas of expertise


Qualities we are looking for:
  • You like working with clients - you will work with customers/product engineering to gather requirements. You like hearing stories.
  • You have a passion for improvement - you have passion for improving processes (e.g. through less code, fewer manual steps, fewer systems, improving velocity).
  • You are law-abiding but agent-of-change - you will advocate compliance with known standards and engage engineers to improve upon processes
  • You are a team player - you mentor others and contribute support documentation; here, heroes work at enriching the team
  • You can multitask - you are action oriented, capable of working concurrent projects
  • You have a developer mindset and are comfortable writing code
  • With an operations mindset you have some experience in maintaining production systems


Expectations:
  • Responsible for creating breakdown of tasks to meet project objectives
  • Responsible for on time ticket and task completion


The growth path will have you:
  • Responsible for turning strategy into multiple project objectives
  • Accountable for/teaching other engineers how to create breakdowns of tasks
  • Accountable for/teaching other engineers how to complete tasks
  • Responsible for sharing their work/experiences with the greater org


Technical Requirements:
  • 5+ years relevant, progressive experience in the reliability engineering space.
  • Knowledge of Build/Release skills: Work with product development teams to maintain SDLC pipelines.
  • Expert monitoring skills: Ensure monitoring tools effectively notify guest-facing issues.
  • Expert team communication skills: Ensure team understands and approves solutions.
  • Expert technical fundamentals: Mastery of Unix system administration duties.
  • Experience in the public cloud: Proficient with launching products on platforms like Google, AWS, Azure, SalesForce, and private clouds.
  • Experience in Infrastructure as Code (IaC): Adopt an IaC mindset (Terraform, Helm, Chef).


Preferred Technical Qualifications (listed in order importance):
  • At least one of the following languages: Golang or Python
  • Container orchestration: Kubernetes, ECS, AppEngine
  • Building docker images
  • Alerting and Monitoring: using Appdynamics, Splunk, Grafana, etc
  • CICD ( with Jenkins or Harness, GitHub or Gitlab )
  • SDLC Build and Release processes


Additional knowledge of/with:
  • Languages: NodeJs, Java
  • AWS Cloud (Fargate, ECS, Lambdas, ApiGateways, EC2, S3, ALB/ELB, Elasticache, EKS, KMS-Secret Manager, VPCs, IAM)
  • Google Cloud Platform (App Engine, Kubernetes ( Helm/Tiller ), Cloud Functions, Firebase, IAM)
  • Logging/Monitoring/Alerting (Cloudwatch/Splunk/AppDynamics/Elasticache/Grafana)
  • Terraform/Atlantis
  • Rundeck, Chef, Ansible, Vault
  • MessageQueueing: RabbitMQ, PubSub
  • Load balancers


Required Education:
  • Bachelor's degree in computer science, Information Systems or related equivalent experience.


Preferred Education:
  • Master's degree in computer science, Information Systems or related equivalent experience.


#DISNEYTECH

The hiring range for this position in Glendale, CA/Anaheim, CA is $138,800 - $186,100 per year and in Seattle, WA is $145,400 - $195,000 per year. The base pay actually offered will take into account internal equity and also may vary depending on the candidate's geographic region, job-related knowledge, skills, and experience among other factors. A bonus and/or long-term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits, dependent on the level and position offered.

Job Posting Segment:
Technology & Digital

Job Posting Primary Business:
Tech Delivery, Platforms, & Core Systems

Primary Job Posting Category:
Site/System Reliability Engineer

Employment Type:
Full time

Primary City, State, Region, Postal Code:
Orlando, FL, USA

Alternate City, State, Region, Postal Code:
USA - CA - 1200 Grand Central Ave, USA - CA - Disneyland Service - Bldg 700 Complex, USA - WA - 925 4th Ave

Date Posted:
2025-04-04

Similar Jobs