SRE Engineer
Apply NowCompany: Tata Consultancy Services
Location: Miami, FL 33186
Description:
Objectives of this role
Run the production environment by monitoring availability and taking a holistic view of system health
Build software and systems to manage platform infrastructure and applications
Improve reliability, quality, and time-to-market of our suite of software solutions
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
Provide primary operational support and engineering for multiple large-scale distributed software applications
Responsibilities
Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding
Partner with development teams to improve services through rigorous testing and release procedures
Participate in system design consulting, platform management, and capacity planning
Create sustainable systems and services through automation and uplifts
Balance feature development speed and reliability with well-defined service-level objectives
Role
SRE(Site Reliability Engineer) /Lead AppDynamics, Splunk, New Relic, Datadog, CloudWatch, Akamai with strong technical experience full stack Oversee the SRE team, ensuring high availability and reliability of services.
Manage incidents and drive post-mortem analyses to prevent recurrence.
Liaise with management to provide updates on service reliability metrics and team performance.
Implement monitoring, alerting, and incident response strategies.
Conduct on-call duties and participate in incident response.
Contribute to post-mortem analysis and service improvement efforts
Provide technical support and troubleshooting assistance
Respond to support tickets, diagnose issues, and offer solutions.
Assist in maintaining documentation of known issues and resolutions.
Salary Range-$100,000-$130,000 a year
#LI-KR1
Run the production environment by monitoring availability and taking a holistic view of system health
Build software and systems to manage platform infrastructure and applications
Improve reliability, quality, and time-to-market of our suite of software solutions
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
Provide primary operational support and engineering for multiple large-scale distributed software applications
Responsibilities
Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding
Partner with development teams to improve services through rigorous testing and release procedures
Participate in system design consulting, platform management, and capacity planning
Create sustainable systems and services through automation and uplifts
Balance feature development speed and reliability with well-defined service-level objectives
Role
SRE(Site Reliability Engineer) /Lead AppDynamics, Splunk, New Relic, Datadog, CloudWatch, Akamai with strong technical experience full stack Oversee the SRE team, ensuring high availability and reliability of services.
Manage incidents and drive post-mortem analyses to prevent recurrence.
Liaise with management to provide updates on service reliability metrics and team performance.
Implement monitoring, alerting, and incident response strategies.
Conduct on-call duties and participate in incident response.
Contribute to post-mortem analysis and service improvement efforts
Provide technical support and troubleshooting assistance
Respond to support tickets, diagnose issues, and offer solutions.
Assist in maintaining documentation of known issues and resolutions.
Salary Range-$100,000-$130,000 a year
#LI-KR1