Site Reliability Engineer

Apply Now

Company: J and M Group

Location: Toronto, ON M4E 3Y1

Description:

  • Bachelor's degree or equivalent in computer science or other technical, scientific field.
  • Minimum 3+ years of relevant technical experience.
  • Ability to solve and determine root cause.
  • Ability to program (structured and OO) with one or more high level languages, such as React, Node.js, Javascript or similar.
  • Experience with one or more of the following: Scheduling (CA, CA WLA), SQL queries and scripting, Excel, Informatica development (or related ETL tools), Shell Scripting/ Power Shell/UNIX, Windows/ Batch Scripting.
  • Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
  • Experience with Agile (Scrum or Kanban), Jira and ServiceNow.
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks.


Preferred Qualifications:
  • Previous success in technical engineering or application support.
  • Coding experience beyond simple scripts.
  • Knowledge of site reliability engineers (SREs) concepts.
  • Experience with monitoring and alerting applications such as Moogsoft, Dynatrace, DEVO, New Relic or other similar tools.


Essential Skills:
  • Troubleshooting and optimizing systems or processes.
  • Understand the business drivers and analytical use-cases.
  • Addresses area-level risks, provides and implements mitigation plan.
  • Reports about area readiness/quality, and raise red flags in crisis situations.
  • Monitor production environment taking a complete view of system health; Track and produce key metrics for the team and develop strategies to increase efficiency.
  • Maintain software and systems to run the applications.
  • Improve reliability, quality, and time-to-market of our suite of software solutions.
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve.
  • Provide primary operational support for multiple large, distributed software applications.
  • Work with business clients, internal and external teams to debug or solve applications issues.
  • Flexible to provide On-call support to resolve issues as required.

Similar Jobs