Lead SRE Engineer
Apply NowCompany: Diverse Lynx LLC
Location: Minnetonka, MN 55345
Description:
Job Summary:
The ideal candidate will have expertise in Docker Containers, Grafana, and SRE. This role involves working in a hybrid model with a day shift.
Top Qualifications:
5 years of leading experience to guide SRE engineers
Digital web products
Containers (Docker) and container orchestration (Kubernetes)
Required Skills:
Docker Containers
Grafana
SRE
Responsibilities:
System Reliability and Performance: Lead and drive end to end (Supply Chain) reliability, availability, and performance of applications in Digital Experience.
Monitoring and Alerting: Design, implement, and maintain robust monitoring and alerting systems to proactively identify and resolve issues.
Infra Capacity Planning: Drive capacity planning, ensuring that systems can handle current and future workloads.
Incident Response: Lead and guide Org level application teams in incident response efforts, ensuring quick and effective resolution of issues.
Performance Tuning: Drive and implement best practices and controls to identify the bottlenecks and support performance tuning before production rollout
Post-Incident Reviews: Drive and support post-incident(P1/P2) reviews to identify root causes and prevent future incidents.
Security: Lead application teams to adopt industry standard best practices in managing security certs, Secrets and Non-User Id's to avoid any issues and outages.
Change Management: Implement robust change management processes to ensure that changes to the system are deployed safely and reliably.
Peak Season Readiness: Support Digital teams to get prepared for peak season in terms of overall E2E system resiliency and redundancy to handle expected peak usage volumes.
War room Playbooks: Support teams in preparation of playbook with War room scenarios.
Auto Failover & Auto Scaling: Lead and Support application teams in adopting best auto failover and auto scaling strategies to maintain overall system resiliency.
Collaboration with Engineers: Work with application development teams to understand their needs, identify potential reliability issues, and improve the software development lifecycle.
Cloud: Define and develop Cloud strategy for the enterprise, focusing on AWS, aligned with IT requirements
Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.
The ideal candidate will have expertise in Docker Containers, Grafana, and SRE. This role involves working in a hybrid model with a day shift.
Top Qualifications:
5 years of leading experience to guide SRE engineers
Digital web products
Containers (Docker) and container orchestration (Kubernetes)
Required Skills:
Docker Containers
Grafana
SRE
Responsibilities:
System Reliability and Performance: Lead and drive end to end (Supply Chain) reliability, availability, and performance of applications in Digital Experience.
Monitoring and Alerting: Design, implement, and maintain robust monitoring and alerting systems to proactively identify and resolve issues.
Infra Capacity Planning: Drive capacity planning, ensuring that systems can handle current and future workloads.
Incident Response: Lead and guide Org level application teams in incident response efforts, ensuring quick and effective resolution of issues.
Performance Tuning: Drive and implement best practices and controls to identify the bottlenecks and support performance tuning before production rollout
Post-Incident Reviews: Drive and support post-incident(P1/P2) reviews to identify root causes and prevent future incidents.
Security: Lead application teams to adopt industry standard best practices in managing security certs, Secrets and Non-User Id's to avoid any issues and outages.
Change Management: Implement robust change management processes to ensure that changes to the system are deployed safely and reliably.
Peak Season Readiness: Support Digital teams to get prepared for peak season in terms of overall E2E system resiliency and redundancy to handle expected peak usage volumes.
War room Playbooks: Support teams in preparation of playbook with War room scenarios.
Auto Failover & Auto Scaling: Lead and Support application teams in adopting best auto failover and auto scaling strategies to maintain overall system resiliency.
Collaboration with Engineers: Work with application development teams to understand their needs, identify potential reliability issues, and improve the software development lifecycle.
Cloud: Define and develop Cloud strategy for the enterprise, focusing on AWS, aligned with IT requirements
Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.