Sr. Site Reliability Engineer, Bare Metal, Infrastructure
Apply NowCompany: Tesla, Inc
Location: Austin, TX 78745
Description:
Tesla cloud as a service seeks a high impact Site Reliability Engineer (SRE) to support our bare-metal provisioning platform at scale. You'll provide direct support to internal customers, resolve complex provisioning issues, and escalate systemic problems to engineering. Your focus: ensuring reliable, automated delivery of bare-metal infrastructure using Kubernetes, Metal, and industry standard tooling across diverse hardware from Supermicro, HPE, and Dell.
Responsibilities
Requirements
Compensation and Benefits
Benefits
Along with competitive pay, as a full-time Tesla employee, you are eligible for the following benefits at day 1 of hire:
Responsibilities
- Provide frontline support for Tesla Cloud, Metal as a Service customers provisioning bare-metal servers
- Troubleshoot and resolve hardware, firmware, network, and provisioning failures (PXE, DHCP, VLAN, BMC)
- Automate image builds (Packer, QCOW2), server configurations (Ansible), and deployment workflows
- Manage and maintain large-scale Kubernetes and Metal-powered provisioning pipelines
- Interface with BMCs via Redfish for remote management, firmware updates, and recovery actions
- Propagate recurring issues and feature requests to engineering teams for roadmap improvements
- Participate in 24/7 on-call rotation ensuring high availability of the MaaS platform
- Own observability: implement monitoring, alerting, and logging for critical systems
Requirements
- Advanced proficiency in Golang and Python for automation and tooling
- Deep Linux expertise (Ubuntu 22.04/24.04) with strong system internals knowledge
- Proven experience with bare-metal provisioning at scale using Kubernetes and Metal
- In-depth knowledge of PXE booting, DHCP, TFTP, and VLAN tagging
- Strong understanding of BMC firmware management and Redfish API operations
- Skilled in infrastructure-as-code (Ansible), CI/CD workflows (GitHub Actions, Jenkins), and artifact management (Artifactory)
- Experience supporting Supermicro, HPE, and Dell hardware in production environments
- Ability to debug complex, cross-layer issues involving hardware, network, and software
- Habitual documenter and knowledge sharer; committed to operational excellence
- Bachelor's Degree in Computer Science, Engineering, or equivalent practical experience
Compensation and Benefits
Benefits
Along with competitive pay, as a full-time Tesla employee, you are eligible for the following benefits at day 1 of hire:
- Aetna PPO and HSA plans > 2 medical plan options with $0 payroll deduction
- Family-building, fertility, adoption and surrogacy benefits
- Dental (including orthodontic coverage) and vision plans, both have options with a $0 paycheck contribution
- Company Paid (Health Savings Account) HSA Contribution when enrolled in the High Deductible Aetna medical plan with HSA
- Healthcare and Dependent Care Flexible Spending Accounts (FSA)
- 401(k) with employer match, Employee Stock Purchase Plans, and other financial benefits
- Company paid Basic Life, AD&D, short-term and long-term disability insurance
- Employee Assistance Program
- Sick and Vacation time (Flex time for salary positions), and Paid Holidays
- Back-up childcare and parenting support resources
- Voluntary benefits to include: critical illness, hospital indemnity, accident insurance, theft & legal services, and pet insurance
- Weight Loss and Tobacco Cessation Programs
- Tesla Babies program
- Commuter benefits
- Employee discounts and perks program