Site Reliability Engineer
Apply NowCompany: Compunnel Software Group
Location: Montreal, QC H1A 0A1
Description:
Role Summary:
We are seeking a Site Reliability Engineer (SRE) to support and enhance the reliability engineering, operations, and customer support for our ServiceNow SaaS platform. This is a hybrid role combining automation, process improvement, and production support with a strong emphasis on building and maintaining reliable and scalable systems. As part of a global SRE community, you'll collaborate with diverse teams and stakeholders to optimize system performance, resolve incidents, and drive service excellence.
The ideal candidate brings a blend of development skills, a problem-solving mindset, and a passion for operational excellence. Whether you come from a development, infrastructure, or systems administration background, if you're eager to apply SRE principles and deliver measurable improvements, we encourage you to apply.
Key Responsibilities:
Required Skills & Qualifications:
Preferred Skills:
Education: Bachelors Degree
We are seeking a Site Reliability Engineer (SRE) to support and enhance the reliability engineering, operations, and customer support for our ServiceNow SaaS platform. This is a hybrid role combining automation, process improvement, and production support with a strong emphasis on building and maintaining reliable and scalable systems. As part of a global SRE community, you'll collaborate with diverse teams and stakeholders to optimize system performance, resolve incidents, and drive service excellence.
The ideal candidate brings a blend of development skills, a problem-solving mindset, and a passion for operational excellence. Whether you come from a development, infrastructure, or systems administration background, if you're eager to apply SRE principles and deliver measurable improvements, we encourage you to apply.
Key Responsibilities:
- Drive improvements in availability, performance, and scalability for the ServiceNow SaaS platform by optimizing and automating operational tasks.
- Collaborate with global SRE colleagues to develop observability tools (metrics, logging, tracing, dashboards) that monitor and define product reliability.
- Engage in incident response and resolution, particularly for ServiceNow and occasionally Linux-based on-premise infrastructure.
- Participate in a global on-call rotation, ensuring timely response and remediation during incidents (time-off in lieu offered).
- Contribute to knowledge documentation and ongoing efforts to understand and map dependencies in ServiceNow and associated systems.
- Identify, prioritize, and address technical debt that hinders performance, reliability, or client satisfaction.
- Collaborate in architecture reviews, process delivery improvements, and operational tooling development to support SRE goals.
- Provide constructive feedback on policies and operational processes to continuously improve service delivery and team effectiveness.
Required Skills & Qualifications:
- Minimum 7 years of relevant experience in software development, system administration, or infrastructure operations.
- Strong proficiency in at least one programming/scripting language (e.g., Python).
- Excellent troubleshooting skills across ServiceNow and Linux-based systems.
- Strong interpersonal and communication skills; capable of building positive, productive relationships across teams.
- Proven dependability in handling time-sensitive or high-impact technical incidents.
- Commitment to continuous learning and improvement of reliability, efficiency, and customer satisfaction.
Preferred Skills:
- ServiceNow administration or development experience (training available if not already acquired).
- Familiarity with SRE principles such as task automation, technical debt reduction, capacity management, and monitoring.
- Experience in a production support or DevOps/SRE role in an enterprise-scale environment.
- Exposure to IT service management (ITSM), SaaS platforms, and enterprise toolchains.
Education: Bachelors Degree