Sr. Site Reliability Engineer, Bare Metal, Infrastructure

Apply Now

Company: Tesla, Inc

Location: Austin, TX 78745

Description:

Tesla cloud as a service seeks a high impact Site Reliability Engineer (SRE) to support our bare-metal provisioning platform at scale. You'll provide direct support to internal customers, resolve complex provisioning issues, and escalate systemic problems to engineering. Your focus: ensuring reliable, automated delivery of bare-metal infrastructure using Kubernetes, Metal, and industry standard tooling across diverse hardware from Supermicro, HPE, and Dell.

Responsibilities
  • Provide frontline support for Tesla Cloud, Metal as a Service customers provisioning bare-metal servers
  • Troubleshoot and resolve hardware, firmware, network, and provisioning failures (PXE, DHCP, VLAN, BMC)
  • Automate image builds (Packer, QCOW2), server configurations (Ansible), and deployment workflows
  • Manage and maintain large-scale Kubernetes and Metal-powered provisioning pipelines
  • Interface with BMCs via Redfish for remote management, firmware updates, and recovery actions
  • Propagate recurring issues and feature requests to engineering teams for roadmap improvements
  • Participate in 24/7 on-call rotation ensuring high availability of the MaaS platform
  • Own observability: implement monitoring, alerting, and logging for critical systems


Requirements
  • Advanced proficiency in Golang and Python for automation and tooling
  • Deep Linux expertise (Ubuntu 22.04/24.04) with strong system internals knowledge
  • Proven experience with bare-metal provisioning at scale using Kubernetes and Metal
  • In-depth knowledge of PXE booting, DHCP, TFTP, and VLAN tagging
  • Strong understanding of BMC firmware management and Redfish API operations
  • Skilled in infrastructure-as-code (Ansible), CI/CD workflows (GitHub Actions, Jenkins), and artifact management (Artifactory)
  • Experience supporting Supermicro, HPE, and Dell hardware in production environments
  • Ability to debug complex, cross-layer issues involving hardware, network, and software
  • Habitual documenter and knowledge sharer; committed to operational excellence
  • Bachelor's Degree in Computer Science, Engineering, or equivalent practical experience


Compensation and Benefits
Benefits

Along with competitive pay, as a full-time Tesla employee, you are eligible for the following benefits at day 1 of hire:
  • Aetna PPO and HSA plans > 2 medical plan options with $0 payroll deduction
  • Family-building, fertility, adoption and surrogacy benefits
  • Dental (including orthodontic coverage) and vision plans, both have options with a $0 paycheck contribution
  • Company Paid (Health Savings Account) HSA Contribution when enrolled in the High Deductible Aetna medical plan with HSA
  • Healthcare and Dependent Care Flexible Spending Accounts (FSA)
  • 401(k) with employer match, Employee Stock Purchase Plans, and other financial benefits
  • Company paid Basic Life, AD&D, short-term and long-term disability insurance
  • Employee Assistance Program
  • Sick and Vacation time (Flex time for salary positions), and Paid Holidays
  • Back-up childcare and parenting support resources
  • Voluntary benefits to include: critical illness, hospital indemnity, accident insurance, theft & legal services, and pet insurance
  • Weight Loss and Tobacco Cessation Programs
  • Tesla Babies program
  • Commuter benefits
  • Employee discounts and perks program

    Similar Jobs