Site Reliability Engineer
Apply NowCompany: Upfront Healthcare
Location: Chicago, IL 60629
Description:
Site Reliability Engineer
Department: Engineering
Employment Type: Full Time
Location: Chicago, Illinois
Description
As a Site Reliability Engineer (SRE) you will be responsible for ensuring the reliability, performance, and scalability of our systems and services. You will work closely with support, development and operations teams to build and maintain the infrastructure that supports our applications, ensuring they run smoothly and efficiently. Your role will involve automating processes, implementing best practices, and responding to incidents to minimize downtime and improve overall system reliability.
Role Responsibilities
Role Related PHI Access
Qualifications
Benefits
Department: Engineering
Employment Type: Full Time
Location: Chicago, Illinois
Description
As a Site Reliability Engineer (SRE) you will be responsible for ensuring the reliability, performance, and scalability of our systems and services. You will work closely with support, development and operations teams to build and maintain the infrastructure that supports our applications, ensuring they run smoothly and efficiently. Your role will involve automating processes, implementing best practices, and responding to incidents to minimize downtime and improve overall system reliability.
Role Responsibilities
- Automation: Develop and implement automation scripts and tools to streamline operations, deployments, and monitoring.
- Monitoring and Alerting: Set up comprehensive monitoring and alerting systems to proactively detect and resolve issues before they impact users.
- Incident Management: Respond to and resolve incidents quickly, conducting root cause analysis and implementing preventive measures.
- Performance Optimization: Identify and address performance bottlenecks and other systemic issues to ensure smooth operation of services.
- Documentation: Maintain clear and detailed documentation of system configurations, processes, and procedures.
- Continuous Improvement: Identifying areas for improvement and innovation in cloud architecture, processes, and tools, and driving initiatives to enhance performance, reliability, and efficiency.
- Off Hours Support: Provide off hours support for any critical incidents that may impact our clients.
Role Related PHI Access
- This role requires access to all client accounts for purposes of client implementations as well as client environment QA/UAT processes.
Qualifications
- Education: Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
- Experience: 3+ years of experience in a Site Reliability Engineering, Support Engineer, or similar role.
- Technical Proficiency:
- Proficiency in scripting languages such as .NET & Node.
- Experience with infrastructure as code (IaC) tools like Terraform, Ansible, or CloudFormation.
- Knowledge of cloud platforms (e.g., AWS & Azure).
- Experience with containerization and orchestration tools like Docker and Kubernetes.
- Familiarity with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or Datadog.
- Experience with managing and optimizing databases (e.g., MySQL, PostgreSQL, MongoDB).
- Problem-Solving Skills: Strong analytical and troubleshooting skills, with a proactive approach to problem-solving.
- Communication: Excellent verbal and written communication skills, with the ability to collaborate effectively with cross-functional teams.
- Adaptability: Ability to work in a fast-paced environment and manage multiple priorities simultaneously.
Benefits
- Competitive salary
- Stock options
- Medical, Vision, and Dental
- 401k
- FSA and HSA
- Employer paid short-term and long-term disability
- Life insurance
- Education reimbursement, adoption assistance, health & wellness perks, and training & development courses
- Commuter benefits
- Flexible PTO policy
- 14 paid company holidays
- Paid personal quarterly community service day