Senior Site Reliability Engineer
Apply NowCompany: Tranzeal Incorporated
Location: Palo Alto, CA 94303
Description:
Key Responsibilities:
Qualifications:
- Design and implement infrastructure automation and deployment pipelines using tools such as Terraform, Ansible, and Jenkins
- Implement and maintain monitoring and logging systems to ensure the reliability and performance of our healthcare AI platform
- Work closely with software engineers to design and deploy scalable, fault-tolerant, and secure production systems on cloud platforms such as AWS, GCP, or Azure
- Develop and maintain security and compliance policies and procedures for our healthcare AI platform
- Collaborate with cross-functional teams to troubleshoot and resolve complex issues related to infrastructure, deployment, and operations
- Implement and maintain disaster recovery and business continuity plans
- Develop and maintain documentation related to infrastructure, deployment, and operations
- Mentor and provide technical guidance to junior engineers
Qualifications:
- Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field
- At least 5 years of professional experience in DevOps engineering or a related field
- Expertise in infrastructure automation and deployment tools such as Terraform, Ansible, Jenkins, or GitLab CI/CD
- Experience with cloud platforms such as AWS, GCP, or Azure
- Strong knowledge of containerization technologies such as Docker and Kubernetes
- Experience with monitoring and logging tools such as ELK, Grafana, or Datadog
- Familiarity with security and compliance best practices and tools such as HashiCorp Vault, AWS KMS, or Azure Key Vault
- Strong problem-solving skills and ability to work independently and collaboratively in a team environment
- Excellent communication and interpersonal skills
- Experience implementing HIP nd SOC2 compliance in a plus
- Experience working in an HPC Environment is a plus