Site Reliability Engineer
Apply NowCompany: J and M Group
Location: Toronto, ON M4E 3Y1
Description:
- Bachelor's degree or equivalent in computer science or other technical, scientific field.
- Minimum 3+ years of relevant technical experience.
- Ability to solve and determine root cause.
- Ability to program (structured and OO) with one or more high level languages, such as React, Node.js, Javascript or similar.
- Experience with one or more of the following: Scheduling (CA, CA WLA), SQL queries and scripting, Excel, Informatica development (or related ETL tools), Shell Scripting/ Power Shell/UNIX, Windows/ Batch Scripting.
- Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
- Experience with Agile (Scrum or Kanban), Jira and ServiceNow.
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks.
Preferred Qualifications:
- Previous success in technical engineering or application support.
- Coding experience beyond simple scripts.
- Knowledge of site reliability engineers (SREs) concepts.
- Experience with monitoring and alerting applications such as Moogsoft, Dynatrace, DEVO, New Relic or other similar tools.
Essential Skills:
- Troubleshooting and optimizing systems or processes.
- Understand the business drivers and analytical use-cases.
- Addresses area-level risks, provides and implements mitigation plan.
- Reports about area readiness/quality, and raise red flags in crisis situations.
- Monitor production environment taking a complete view of system health; Track and produce key metrics for the team and develop strategies to increase efficiency.
- Maintain software and systems to run the applications.
- Improve reliability, quality, and time-to-market of our suite of software solutions.
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve.
- Provide primary operational support for multiple large, distributed software applications.
- Work with business clients, internal and external teams to debug or solve applications issues.
- Flexible to provide On-call support to resolve issues as required.