Site Reliability Engineer

Apply Now

Company: Xurrent

Location: Austin, TX 78745

Description:

Join Xurrent, a dynamic and innovative company that is reshaping the landscape of IT Service Management (ITSM). At Xurrent, we believe in the power of transformation, not just in the solutions we provide to our clients, but also in the careers of our team members. As a leader in the industry, we are on a mission to attract the best talent, those who are driven by curiosity, innovation, and a passion for making a real impact. Our platform is revolutionizing the way organizations manage their service relationships, and we are looking for dedicated individuals who are ready to join us in this journey of transformation.

If you're eager to be part of a forward-thinking company that thrives on community, engaging work, and grit, Xurrent is the place for you. We don't just offer jobs; we offer transformative career experiences. Our culture is built on empowerment, and we provide the resources and support needed for our team members to excel, innovate, and drive change. Join us at Xurrent and be part of a team that is shaping the future of ITSM while propelling your own career to new heights.

Overview

Xurrent, Inc. ("Xurrent") designates a Site Reliability Engineer. The Site Reliability Engineer (SRE) will play a crucial role in ensuring the reliability, availability, and performance of Xurrent's services. As a key technical contributor, the SRE will establish and maintain robust systems and processes, minimizing disruptions and maximizing the overall quality of technical experiences. Much of the software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. Azure and AWS experience is highly desirable with Azure familiarity being required. Reporting to the Director of Site Reliability and TechOps, the SRE will collaborate closely with internal teams to drive operational excellence of the SRE site. The SRE keeps a continuous eye on Xurrent's systems capacity and performance. Our goal is to provide our customers the best, most reliable, and fastest experience possible.

Key Outcomes

3 Months
  • Gain a deep understanding of Xurrent's infrastructure, services, and systems as evaluated by senior members of the technical staff.
  • Identify and document potential areas for improvement in reliability, availability, and performance within Xurrent's SDLC.
  • Review existing incident response and resolution processes.
  • Identify opportunities to improve monitoring and alerting systems.
  • Cross-train on Azure and AWS hosted applications and technologies building the foundation for cross-coverage between DevOps and SRE.

6 Months
  • Offer insight and opinions on investments and resourcing decisions related to infrastructure and reliability.
  • Assist with the systems design in an effort to protect all customer data and critical systems from any single person's influence.
  • Continually iterate on incident response and resolution processes, incorporating best practices and lessons learned.
  • Test the comprehensive disaster recovery plan's effectiveness.
  • Establish failure testing protocols similar in nature to chaos engineering.
  • Identify areas for cost optimizations and communicate them.

12 Months
  • Contribute to the Change Advisory Board (CAB) and actively participate in audits to drive continuous improvements in reliability and stability.
  • Continuously seek opportunities to improve rollouts of infrastructure changes.
  • Perform a proactive maintenance plan for critical systems to prevent service disruptions.
  • Identify potential bottlenecks and opportunities to improve scaling Xurrent's infrastructure.
  • Improve Xurrent templates and workflows to better fit the team's needs.

Responsibilities / Ownership
  • Provide support for infrastructure, site reliability, and service uptime for Azure and AWS hosted technology.
  • Work with the team on technical initiatives related to site reliability engineering within the organization to improve the availability, scalability, latency, and efficiency of Xurrent's services.
  • Support and participate in maintaining a culture of operational excellence, emphasizing proactive monitoring, incident response, and continuous improvement.
  • Collaborate with internal teams to address infrastructure-related challenges and ensure alignment with business objectives.
  • Provide guidance to other team members on managing availability and performance of the Xurrent services, building automation to prevent problem recurrence, and building automated responses for non-exceptional service conditions.
  • Proactively identify and mitigate risks to minimize service disruptions and optimize performance.
  • Research and stay abreast of new technologies and solutions that would potentially benefit Xurrent and its users.


Requirements:
  • 2-5 years of experience developing backend systems or Infrastructure as Code (IAC) templates.
  • Strong interpersonal skills
  • Strong functional networking knowledge
  • Strong application and container security knowledge
  • AWS CodePipeline / CodeBuild experience
  • Basic development skills with Javascript/Python/Ruby

Preferred:
  • AWS DevOps or Cloud Practitioner certification
  • CloudFormation knowledge


Statement of Equal Opportunity

Xurrent is an equal opportunity employer. We're committed to creating an inclusive environment for all our employees, where different backgrounds and perspectives are valued and encouraged - regardless of race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, sexual orientation or on the basis of any protected group status under any applicable law.

Similar Jobs