Lead Site Reliability Engineer
Apply NowCompany: Massachusetts Institute of Technology
Location: Cambridge, MA 02139
Description:
Posting Description
LEAD SITE RELIABILITY ENGINEER, Office of Research Computing and Data (ORCD), to build and advance SRE functions in collaboration with a diverse team of systems engineers; play a pivotal part in the strategic transformation of infrastructure planning, design, delivery, and operations in support of ORCD's continued growth; and build and foster cross-functional collaboration between engineering and operations teams across MIT, ensuring alignment with institutional objectives and long-term strategic initiatives.
Find the full job description here: https://orcd.mit.edu/about-orcd/jobs
Job Requirements
REQUIRED : Bachelor's degree in engineering, computer science, related field or equivalent industry experience; a minimum of seven years of experience in site reliability engineering or a related field; possess a deep and broad expertise across multiple technical domains, including Linux, networking, and virtualization; ability to drive innovation in system architecture and lead transformative design initiatives from the ground up; robust analytical and structured problem-solving skills, coupled with excellent communication and inter-personal abilities; deep understanding of Linux, LDAP, virtualization & config management in a large Linux-based engineering environment. PREFERRED : 10+ years of experience in site reliability engineering; experience working within an HPC/research computing environment; ability to analyze network traffic to identify technical issues and suspicious activities. Job #24909-11
4/8/2025
LEAD SITE RELIABILITY ENGINEER, Office of Research Computing and Data (ORCD), to build and advance SRE functions in collaboration with a diverse team of systems engineers; play a pivotal part in the strategic transformation of infrastructure planning, design, delivery, and operations in support of ORCD's continued growth; and build and foster cross-functional collaboration between engineering and operations teams across MIT, ensuring alignment with institutional objectives and long-term strategic initiatives.
Find the full job description here: https://orcd.mit.edu/about-orcd/jobs
Job Requirements
REQUIRED : Bachelor's degree in engineering, computer science, related field or equivalent industry experience; a minimum of seven years of experience in site reliability engineering or a related field; possess a deep and broad expertise across multiple technical domains, including Linux, networking, and virtualization; ability to drive innovation in system architecture and lead transformative design initiatives from the ground up; robust analytical and structured problem-solving skills, coupled with excellent communication and inter-personal abilities; deep understanding of Linux, LDAP, virtualization & config management in a large Linux-based engineering environment. PREFERRED : 10+ years of experience in site reliability engineering; experience working within an HPC/research computing environment; ability to analyze network traffic to identify technical issues and suspicious activities. Job #24909-11
4/8/2025