Lead Site Reliability Engineer
Apply NowCompany: VDart Inc
Location: San Jose, CA 95123
Description:
Job Title: Lead Site Reliability Engineer
Location: San Jose, CA (2 Days Hybrid)
Duration: / Term: 6+ months
Job Description:
Experience Desired: 14+ Years.
Responsibilities:
Please look for 14 years hands on Coding/scripting (Ansible) , Python , Cloud Computing
About the Role
We seek a highly skilled and dynamic Site Reliability Engineer - Consultant In this role you will
Maintain and improve the reliability, performance, and availability of software systems.
Act as a bridge between traditional IT operations and software development, bringing a software engineering approach to system administration.
Job Responsibilities
Creating and supporting automation scripts (shell/ansible/python) for infrastructure deployments, validations and monitoring to improve operational tasks
Scheduling monitoring scripts using cron and airlfow
Monitoring using tools including Dynatrace, Apica, Grafana etc
Database handling
Build CICD pipelines
Incident handling and problem management
Mandatory Skills
Experience in Ansible/ Python
Monitoring Tools - Dynatrace/Apica/Grafana
Required Education Bachelor's degree in computer science or a related field.
Required Experience
14 plus years of IT Infrastructure experience
Extensive experience working with linux flavors like rhel/centos os, shells, filesystems and utilities
Experience in programming languages like Python, ansible
Knowledge of distributed computing and experience working with container orchestration frameworks including on-prem and rancher kubernetes and good knowledge on kubernetes objects
Experience working with Storage, ONTAP is preferable: volume, aggregates, back ups, DR planning
Experience scheduling monitoring scripts using cron and airlfow
Experience with monitoring tools including Dynatrace, Apica, Grafana etc
Database knowledge including sql and nosql dbs
Experience building CICD pipelines (preferred)
Cloud platform knowledge (specifically AWS) is required
Key Skills:
SRE, AWS, Python, Monitoring Tools - Dynatrace/Apica/Grafana
Location: San Jose, CA (2 Days Hybrid)
Duration: / Term: 6+ months
Job Description:
Experience Desired: 14+ Years.
Responsibilities:
Please look for 14 years hands on Coding/scripting (Ansible) , Python , Cloud Computing
About the Role
We seek a highly skilled and dynamic Site Reliability Engineer - Consultant In this role you will
Maintain and improve the reliability, performance, and availability of software systems.
Act as a bridge between traditional IT operations and software development, bringing a software engineering approach to system administration.
Job Responsibilities
Creating and supporting automation scripts (shell/ansible/python) for infrastructure deployments, validations and monitoring to improve operational tasks
Scheduling monitoring scripts using cron and airlfow
Monitoring using tools including Dynatrace, Apica, Grafana etc
Database handling
Build CICD pipelines
Incident handling and problem management
Mandatory Skills
Experience in Ansible/ Python
Monitoring Tools - Dynatrace/Apica/Grafana
Required Education Bachelor's degree in computer science or a related field.
Required Experience
14 plus years of IT Infrastructure experience
Extensive experience working with linux flavors like rhel/centos os, shells, filesystems and utilities
Experience in programming languages like Python, ansible
Knowledge of distributed computing and experience working with container orchestration frameworks including on-prem and rancher kubernetes and good knowledge on kubernetes objects
Experience working with Storage, ONTAP is preferable: volume, aggregates, back ups, DR planning
Experience scheduling monitoring scripts using cron and airlfow
Experience with monitoring tools including Dynatrace, Apica, Grafana etc
Database knowledge including sql and nosql dbs
Experience building CICD pipelines (preferred)
Cloud platform knowledge (specifically AWS) is required
Key Skills:
SRE, AWS, Python, Monitoring Tools - Dynatrace/Apica/Grafana