Site Reliability Engineer

Apply Now

Company: Cynet Systems

Location: Phoenix, AZ 85032

Description:

Job Description:

Required Skills:

3-5 years of service reliability/operation experience running large-scale, high-performance applications in a hybrid environment (on-prem and cloud).
3-5 years of experience writing automation scripts and building dashboards for application performance management to manage transaction journeys.
2-4 years of experience working with programming languages such as Go, Python, Java, Rust, etc.
Working knowledge of one or more databases: Oracle, SQL Server, Redis, Clickhouse, PostgreSQL, MongoDB, or any time-series databases.
t least 2+ years of experience transitioning platforms to the cloud and containerization - GCP, AWS, and Rancher (or Cloud Formation, Azure, and OpenShift).
Experience maintaining containerized applications in GKE/RKE/AKE environments.
Experience implementing cloud observability using OTEL to enable real-time monitoring, distributed tracing, and incident resolution.
Experience working with specific GraphQL frameworks (Apollo, Prisma, Hasura, etc.).
Experience using knowledge of networking protocols such as TCP/IP, HTTP, DNS, load balancing, and service mesh to troubleshoot issues in high-pressure situations.

Preferred Skills:

Proven experience managing application availability, building creative solutions to manage repetitive activities, improving gating.
Working knowledge of monitoring tools - Client, AppDynamics, Grafana/Prometheus, and Dynatrace.
Experience with tools like Rally, Confluence, and other CI/CD extenders.
Hands-on experience with implementing in-memory caching solutions.
Experience with Redis DB is a plus.
Excellent debugging skills across a variety of integrated technical platforms on API gateway.
Hands-on experience with GCS, Cloud SQL, Spanner, and Firestore.
Extensive experience in enterprise-level infrastructure and operations.
Experience in high availability and distributed systems, Linux and Windows administration, troubleshooting, and support.
Monitor and troubleshoot HashiCorp Vault environments, ensuring minimal downtime and rapid recovery from incidents.
Working knowledge of Vertex AI, Gen AI, and BigQuery.