Sr. SRE w IAM & Cloud Consultant at Whippany, NJ - Onsite

Apply Now

Company: SysMind Tech

Location: Whippany, NJ 07981

Description:

Position: Sr. SRE w IAM & Cloud Consultant

Location: Whippany, NJ - Onsite

Position Type: Long Term Contract Position

Key Responsibilities:
  • Designing, implementing, deploying and running highly available, fault-tolerant, auto-scaling and auto-personaling systems
  • Deep expertise in AWS, Azure, and GCP, including Kubernetes (EKS, ECS, Fargate, GKE) and server less architectures
  • Implementing advanced monitoring (Prometheus, Grafana, Datadog, ELK), tracing, logging and automated alerting solutions.
  • Scaling distributed systems, optimising compute/storage efficiency, and cost management.
  • Designing failure simulations to improve system robustness and incident response.
  • Expert in AWS CLI, CloudFormation, Ansible, Helm, and GitOps for automated infrastructure provisioning.
  • Driving reliability best practices across engineering teams, embedding SRE principles into tperson DevSecOps lifecycle.
  • Partnering with engineering, security, and product teams to balance reliability and feature velocity.
  • Expertise in CIAM, ForgeRock stack (PingGateway, PingAM, PingIDM, PingDS) with certification or proof of completion of ForgeRock Deep-Dive 400 trainings.
  • Building and mentoring high-performing SRE teams, fostering a culture of automation and innovation.
  • Defining and enforcing reliability metrics to balance innovation with system stability.
  • Optimizing deployment pipelines for high-frequency, zero-downtime releases.
  • Leveraging machine learning for anomaly detection, predictive scaling, and automated remediation

Required Skills:
  • 5+ years' experience in hands-on configuration, deployment and running ForgeRock COTS based IAM solutions (PingGateway, PingAM, PingIDM, PingDS) with automated GitOps CI/CD pipelines using GitLab.
  • Design and hands-on implementation of GitOps CI/CD pipelines, automated failover, data backup and restore solutions
  • Automating telemetry, dashboards.
  • 10+ years' experience in Running Disaster Recovery, zero downtime deployment solutions.
  • Designing and implementing continuous delivery.
  • Hands-on coding in Python, Bash and JSON/Yaml (CaC).
  • Supporting large-scale, distributed, cloud-based micro service and API service solutions with 99.9%+ uptime.

Similar Jobs