Site Reliability Engineer, Product - USDS
Apply NowCompany: TikTok
Location: Seattle, WA 98115
Description:
Responsibilities
About the team
The Product Engineering team monitors and maintains the availability of TikTok, including services such as video playback, content discovery/recommendations, live streaming, and customer service feedback.
In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.
Responsibilities
In this role, you will:
- Gain a solid understanding of the various components and services that power the TikTok experience
- Maintain services to meet service-level-agreements (SLAs) and service-level-objectives (SLOs) by measuring and monitoring availability, performance, and overall system health
- Participate as part of a global team to support site-up issues to ensure that services are reliable, fault-tolerant, efficiently scalable and cost-effective
- Scale systems sustainability through mechanisms such as automation; evolve systems reliability, efficiency, and velocity by pushing for changes
- Provide user support, incident responses and postmortems
Qualifications
Minimum Qualifications
1. Bachelor or above degree in Computer Science or a related technical discipline with 3-5+ years experience in the deployment and administration of large-scale distributed systems
2. Strong understanding of Unix/Linux operating systems internals and administration, networking (e.g. TCP/IP, routing, network topologies and hardware), storage systems, and database systems
3. Experience in one or more programming languages, such as C, C++, Java, Python, Go, Ruby, Rust, JavaScript
4. Experience in debugging and optimizing code and automate routine tasks
5. Experience in development, testing, deployment and administration of one or more of the following types of systems: Nginx, Kubernetes, Docker, OpenStack, Hadoop, Spark, Flink, Kafka
6. Experience in designing and analyzing large-scale distributed systems is preferred
7. Strong skills in problem solving and communication
Candidates for this position must be legally authorized to work in the United States. This position is not eligible for visa sponsorship or support.
About the team
The Product Engineering team monitors and maintains the availability of TikTok, including services such as video playback, content discovery/recommendations, live streaming, and customer service feedback.
In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.
Responsibilities
In this role, you will:
- Gain a solid understanding of the various components and services that power the TikTok experience
- Maintain services to meet service-level-agreements (SLAs) and service-level-objectives (SLOs) by measuring and monitoring availability, performance, and overall system health
- Participate as part of a global team to support site-up issues to ensure that services are reliable, fault-tolerant, efficiently scalable and cost-effective
- Scale systems sustainability through mechanisms such as automation; evolve systems reliability, efficiency, and velocity by pushing for changes
- Provide user support, incident responses and postmortems
Qualifications
Minimum Qualifications
1. Bachelor or above degree in Computer Science or a related technical discipline with 3-5+ years experience in the deployment and administration of large-scale distributed systems
2. Strong understanding of Unix/Linux operating systems internals and administration, networking (e.g. TCP/IP, routing, network topologies and hardware), storage systems, and database systems
3. Experience in one or more programming languages, such as C, C++, Java, Python, Go, Ruby, Rust, JavaScript
4. Experience in debugging and optimizing code and automate routine tasks
5. Experience in development, testing, deployment and administration of one or more of the following types of systems: Nginx, Kubernetes, Docker, OpenStack, Hadoop, Spark, Flink, Kafka
6. Experience in designing and analyzing large-scale distributed systems is preferred
7. Strong skills in problem solving and communication
Candidates for this position must be legally authorized to work in the United States. This position is not eligible for visa sponsorship or support.