Technical Lead – SRE

Apply Now

Company: Tetrate

Location: Milpitas, CA 95035

Description:

Dont just follow the industry; define it. Tetrate, creators of Envoy Gateway and Envoy AI Gateway, and architects of industry-standard security practices (SPIFFE and NGAC), is building a world-class field engineering team. Are you ready to buildapplications that power the global economy, Fortune 150 companies, and protect national security? Were looking for a Technical Lead SRE who will apply cloud operations practices across our hybrid environments, improve customer outcomes, and own the operational roadmap.

Tetrate seeks an outcome driven, technically adept Technical Lead SRE to champion our enterprise customers, demonstrating how we solve Layer 7 challenges and security vulnerabilities.

Responsibilities:

  • Operational Excellence & Incident Management
    • Improve MTTD and MTTR through enhanced monitoring, logging, and alerting.
    • Establish SRE practices, build operational dashboards, and maintain runbooks.
    • Enhance Customer experience working with CRE team members with SREbest practices.
    • Use tools like Prometheus, Grafana, Datadog, OpenTelemetry, and Elastic Stack for observability.
    • Automate health checks and incident response with Terraform, Ansible, Helm, and Kubernetes.
  • Customer Engagement & Architecture Review
    • Analyze customer architectures and operational practices.
    • Identify themes from escalations and map them to architectural gaps or operational improvements.
    • Provide tailored recommendations and help implement improvement plans for customers environments.
    • Develop standard operating procedures (SOPs) for deployment, maintenance, and incident handling in customer environments.
    • Provide proactive guidance on performance tuning, disaster recovery (DR) strategies, and scaling mechanisms.
    • Establish secure connectivity and seamless integration between the hosted management plane and customer environments.
    • Lead root cause analysis (RCA) and propose long-term solutions for recurring issues.
  • Product & Hybrid Architecture Optimization
    • Apply cloud practices (CI/CD, GitOps) to hybrid and on-prem environments.
    • Apply Cloud Best Practices (e.g., AWS Well-Architected Framework) to enhance both internal product development and customer environments.
    • Build custom plugins and automation scripts to meet customer needs and extend Flagship product capabilities.
    • Collaborate with product teams to implement metrics improvements, UI enhancements, and alerts for hosted solutions.
  • Ownership of Hosted Operations
    • Develop and execute an operational plan for hosted environments, including monitoring, alerts, and product improvements.
    • Take ownership of getting on-prem customers to implement hosted operational improvements, ensuring alignment with hosted best practices.
  • Collaboration and Leadership
    • Partner with developer, platform, and security teams to align operational goals with product roadmaps.
    • Mentor other engineers on cloud-native operations best practices, focusing on Zero Trust principles.
    • Drive continuous improvement through automation, Shift-Left initiatives, and SRE (Site Reliability Engineering) methodologies.

    Required Skills:

    • 8+ years of experience in Cloud Operations, SRE, or DevOps roles.
    • Strong hands-on experience with Kubernetes, Istio, Envoy, Gateway, Load Balancers and hybrid architectures.
    • Hands-on experience with cloud platforms such as AWS, GCP, or Azure and knowledge of hybrid/cloud-native architectures.
    • Strong analytical and troubleshooting skills with experience in Postgres, Elastic DB, and GraphQL queries.
    • Experience building CI/CD pipelines with tools like GitHub Actions, or ArgoCD.
    • Familiarity with on-prem deployments and integration with public cloud hosted services.
    • Familiarity with LDAP, OIDC, SAML authentication and security configurations.
    • Ability to collaborate with customers and cross-functional teams to drive operational improvements.
    • Experience with CI/CD, GitOps practices, and networking concepts.
    • Prior exposure to multi-cloud deployments and hybrid architectures with VM and container-based workloads.
    • Ability to communicate complex technical concepts clearly to both technical and non-technical stakeholders.
    • Prior experience interacting directly with enterprise customers for operational troubleshootingand architecture reviews.
    • 5+ years of experience in Python, Golang (Go),
    • 3-5+ years of Bash / Shell Scripting.
    • 1-2 years of Javascript or Typescript.
    • 2-3 years of Infrastructure as Code tools like Terraform
    • Good familiarity with YAML/JSON.

    If youre looking for a job, this isnt it. If youre ready to be part of something larger than yourself, connect with us.

    Locations: Were a fully distributed team with a global presence in 15 countries. While this role requiresNorth American timezone coverage, we welcome exceptional talent from anywhere. Visa sponsorship (H1B) is supported.




    Similar Jobs