Site Reliability Engineer (SRE)

Apply Now

Company: ASB Resources

Location: Middlesex, NJ 08846

Description:

NYC / NJ location (Metropark, Iselin NJ) 3 days a week work from office

Duration - 1+ year

Job Description:

10+ years of Software Engineering, and Architecture experience with at least 5+ years on SRE focused experience in Production Support, Application Support and DevOps implementation.

Demonstrated experience enabling SRE principles and practices with technical and operations teams in different SRE maturity levels in Engineering and Operations space.
Demonstrated experience influencing design committee and process teams to establish standards by improving the approaches and maturity across IT teams.
Work closely with Infrastructure services and product teams to develop reliable solutions to improve availability, scalability, and performance targets.
Experience in SDLC life cycle from architecture and software designs, SLA/SLO definitions, tech debts reviews, CI/CD releases, monitoring KPIs to DevOps principles.
Experience in production systems analyzing performance and error metrics, lead triage and troubleshooting exercises and track incident management targets (MTTx)
Strong experience in infrastructure and Applications technology components and designs, assess problem areas (logs/events), support in analysis (metrics/traces) and recommend solutions.
Hands-on experience coding and developing automation solutions leveraging APIs based integrations, configuration using Ansible and Terraform for IAAS solutions.
Experience working in microservices and containerized platforms to support platforms through monitoring, alerting, and troubleshooting needs part of service operations.
Technical knowledge and experience in cloud architectures, hybrid cloud and cloud native solutions to leverage reliable designs in cloud to improve operational efficiencies.
Experience working in Incident management, leveraging postmortem analysis and developing reliable solutions part of driving multiple incident management initiatives.
Experience in Observability tools and frameworks, concepts of golden signals, MELT data integration and Analysis using market solutions to improve operational efficiencies.
Experience managing and growing teams to achieve short-term and long-term goals part of the SRE RoadMap and align with SRE strategic goals.
Experience handling partnership with multiple peers, stakeholders and able to interact with leadership team and technical teams at different levels.

Ability to adapt, support multiple application and infrastructure groups towards SRE needs in a fast-paced, dynamic, and growing organization.
Must Have
10+ years of overall IT experience focusing on Software Engineering, Architecture and/or supporting Production technologies.
5-7+ years of Monitoring analysis experience using ANY Observability solutions like Splunk, Dynatrace, New Relic, Grafana and Datadog etc.
5+ years of development/coding experience developing engineering solutions for a large-scale, mission-critical applications.
5+ years of hands-on experience as SRE lead or individual contributor delivering on SRE goals and objectives across IT groups.
5+ years of experience working in Kubernetes platforms, public cloud - AWS, GCP, Azure to support in implementation or operational needs.

Site Reliability Engineer (SRE)

Similar Jobs