Site Reliability Engineer
Apply NowCompany: Cynet Systems
Location: Phoenix, AZ 85032
Description:
Job Description:
Required Skills:
Required Skills:
- 3-5 years of service reliability/operation experience running large-scale, high-performance applications in a hybrid environment (on-prem and cloud).
- 3-5 years of experience writing automation scripts and building dashboards for application performance management to manage transaction journeys.
- 2-4 years of experience working with programming languages such as Go, Python, Java, Rust, etc.
- Working knowledge of one or more databases: Oracle, SQL Server, Redis, Clickhouse, PostgreSQL, MongoDB, or any time-series databases.
- t least 2+ years of experience transitioning platforms to the cloud and containerization - GCP, AWS, and Rancher (or Cloud Formation, Azure, and OpenShift).
- Experience maintaining containerized applications in GKE/RKE/AKE environments.
- Experience implementing cloud observability using OTEL to enable real-time monitoring, distributed tracing, and incident resolution.
- Experience working with specific GraphQL frameworks (Apollo, Prisma, Hasura, etc.).
- Experience using knowledge of networking protocols such as TCP/IP, HTTP, DNS, load balancing, and service mesh to troubleshoot issues in high-pressure situations.
- Proven experience managing application availability, building creative solutions to manage repetitive activities, improving gating.
- Working knowledge of monitoring tools - Client, AppDynamics, Grafana/Prometheus, and Dynatrace.
- Experience with tools like Rally, Confluence, and other CI/CD extenders.
- Hands-on experience with implementing in-memory caching solutions.
- Experience with Redis DB is a plus.
- Excellent debugging skills across a variety of integrated technical platforms on API gateway.
- Hands-on experience with GCS, Cloud SQL, Spanner, and Firestore.
- Extensive experience in enterprise-level infrastructure and operations.
- Experience in high availability and distributed systems, Linux and Windows administration, troubleshooting, and support.
- Monitor and troubleshoot HashiCorp Vault environments, ensuring minimal downtime and rapid recovery from incidents.
- Working knowledge of Vertex AI, Gen AI, and BigQuery.