SRE (Site Reliability Engineer)
Apply NowCompany: SchoolAI
Location: Lehi, UT 84043
Description:
About SchoolAI
We make school awesome everyday for students and the people supporting them by finding out what they need and making it happen.
Values
We want everyone at SchoolAI to make a competitive wage, have fun, and do good. As we build out the team, we optimize for, and invest in, great people who:
We use our core values to make decisions, evaluate work, and win. We expect that every team member demonstrate these values. In turn, we'll invest in you and your impact at SchoolAI.
About the role
We'd love to hear from you if you:
We make school awesome everyday for students and the people supporting them by finding out what they need and making it happen.
Values
We want everyone at SchoolAI to make a competitive wage, have fun, and do good. As we build out the team, we optimize for, and invest in, great people who:
- Make Magic: You're making magic when you dream big, sweat the details, make the complex look elegant, and the final result is one that puts a smile on everyone's face. People are drawn to working with you and your work.
- Rise to the Moment: You're rising to the moment when you show up, answer the call, make the extra effort, and appreciate the fact that we are working with the coolest technology, in the biggest space it'll impact, and we're in pole position.
- Connect the Dots: You've done the work to prioritize, understand, and tie your work to your team, to students, and to the people who support them. You're driving tangible outcomes and can connect your work to meaningful, tangible business results and goals. You're connecting the dots when you've thought deeply and have a throughline narrative for your work.
- Make High Leverage Bets: Always consider the board in front of you - using data, instinct, honed expertise, and user problems to pick the best move and maximize the impact of every resource. When the moment is right - go all in.
- Simplify and Go: Cut through the noise. Focus on what's essential and act with urgency. Speed is your ally, simplicity your tool. Embrace clarity and let it guide you to swift victories.
We use our core values to make decisions, evaluate work, and win. We expect that every team member demonstrate these values. In turn, we'll invest in you and your impact at SchoolAI.
About the role
- Architect and implement resilient, scalable system designs across our cloud infrastructure (primarily GCP)
- Design and evolve our infrastructure architecture with a focus on scalability, observability, and performance optimization
- Establish architectural patterns and guardrails that promote reliability while enabling rapid development
- Develop well-architected internal tooling and abstractions to streamline deployment, monitoring, and debugging
- Lead incident response with an architectural mindset-conducting thorough root cause analysis and architecting systemic improvements
- Partner with product, ML, and engineering teams to deeply understand their requirements and design appropriate infrastructure solutions
- Champion infrastructure-as-code, automated testing, and continuous delivery through thoughtful architecture decisions
- Own and architect key components of our CI/CD pipelines and platform engineering efforts
We'd love to hear from you if you:
- Have strong architectural thinking-demonstrated through designing, implementing, and documenting resilient, scalable systems in production
- Have 3+ years of experience in site reliability engineering with focus on systems architecture
- Are experienced with cloud platforms (particularly GCP) and container orchestration (Kubernetes)
- Are proficient with infrastructure-as-code (Terraform), Git/GitHub, and CI/CD pipelines (especially GitHub Actions)
- Have experience with containerization (Docker) and PR management tools (Graphite)
- Are familiar with NodeJS, JavaScript, and TypeScript environments
- Excel at monitoring and observability implementation (particularly Datadog) as part of system architecture
- Have knowledge of networking concepts and security best practices
- Possess experience with or interest in database management and optimization
- Have led or participated in large-scale cloud migration projects
- Can articulate technical trade-offs and architectural vision effectively across teams
- Demonstrate problem-solving skills for complex infrastructure issues
- Are passionate about automation-designing comprehensive solutions that reduce toil through elegant code