SRE Engineer

Apply Now

Company: Cynet Systems

Location: Hanover, NH 03755

Description:

Job Description:
  • Designing, implementing, deploying and running highly available, fault-tolerant, auto-scaling and auto-healing systems.
  • Deep expertise in AWS, Azure, and GCP, including Kubernetes (EKS, ECS, Fargate, GKE) and server less architectures.
  • Implementing advanced monitoring (Prometheus, Grafana, Datadog, ELK), tracing, logging and automated alerting solutions.
  • Scaling distributed systems, optimising compute/storage efficiency, and cost management.
  • Designing failure simulations to improve system robustness and incident response.
  • Expert in AWS CLI, Cloud Formation, Ansible, Helm, and GitOps for automated infrastructure provisioning.
  • Driving reliability best practices across engineering teams, embedding SRE principles into the Dev Sec Ops lifecycle.
  • Partnering with engineering, security, and product teams to balance reliability and feature velocity.
  • Expertise in CIAM, ForgeRock stack (PingGateway, PingAM, PingIDM, PingDS) with certification or proof of completion of ForgeRock Deep-Dive 400 trainings.
  • Building and mentoring high-performing SRE teams, fostering a culture of automation and innovation.
  • Defining and enforcing reliability metrics to balance innovation with system stability.
  • Optimising deployment pipelines for high-frequency, zero-downtime releases.
  • Leveraging machine learning for anomaly detection, predictive scaling, and automated remediation.
Skillset Required:
  • 5+ years experience in hands-on configuration, deployment and running ForgeRock COTS based IAM solutions (PingGateway, PingAM, PingIDM, PingDS) with automated GitOps CI/CD pipelines using GitLab.
  • Design and hands-on implementation of GitOps CI/CD pipelines, automated failover, data backup and restore solutions.
  • utomating telemetry, dashboards.
  • 10+ years experience in Running Disaster Recovery, zero downtime deployment solutions.
  • Designing and implementing continuous delivery.
  • Hands-on coding in Python, Bash and JSON/Yaml (CaC).
  • Supporting large-scale, distributed, cloud-based micro service and API service solutions with 99.9%+ uptime.

Similar Jobs