Site Reliability Engineer

Apply Now

Company: Codeforce360

Location: Allen, TX 75002

Description:

Required Skills:

Python, Kubernetes, Containers, Docker, Amazon Web Services (AWS), and Linux Looking for a true Site Reliability Engineer, NOT a DevOps engineer, this team is not interested in CI/CD automation, just a focused SRE Will be setting up AWS cloud infrastructure with Python.
Must be: -Expert in Python Programming/Developing (Knows "how do you set up EC2 container with Python") -Great with AWS -Great with Linux -Good understanding of Container Security (docker/kubernetes).

Basic Qualifications:

Python, Kubernetes, Docker, Amazon Web Services (AWS), and Linux Site Reliability Engineer, NOT a DevOps Engineer.

About the Role:

Our Site Reliability Engineering (SRE) team is responsible for the cloud infrastructure and observability for entire Client Consumer Information Services (North America) division.
We are seeking a Software Engineer for our SRE team to help us continuously improve how we build, monitor, secure and run our rapidly growing cloud platform.
Much of our software development focuses on building infrastructure and eliminating work through automation.
On SRE team, you will have opportunity to use your expertise in coding, system design thinking and analytical skills to provide reliable cloud infrastructure and observability tools for the rest of the product development teams.

What You'll Do Here:

Build our platforms, systems and infrastructure using your solid expertise in coding.
Work closely with product development teams, provide hands-on engagement to develop, direct and implement reliable, secured and cost-effective cloud solutions.
Participate with a cross department-functional team to establish a cloud operational governance framework.
And often times involve routine grunt work on service requests to assist other teams with platform services.

Must Have Skills:

Deep understanding of Linux, networking, cloud design patterns, API's, and security.
Solid professional coding experience with at least one scripting language - Shell, Python etc.
At least 3+ years of experience working with AWS Infrastructure services with emphasis on IAM, Network, EC2, Lambda, S3, CloudWatch, CloudTrail and in general overall Security.
Strong knowledge and implementation history of Terraform, Packer, Ansible, Chef, Jenkins or any other similar tooling.
Excellent knowledge and working experience in implementing one or more Observability platforms like Prometheus, InfluxDB, Dynatrace, Grafana, Splunk etc. to measure telemetry data like logs, metrics and traces.

Nice to have skills:

Previous experience with running containers (Docker/LXC) in a production environment using one of the container orchestration services (Kubernetes, Docker Swarm, AWS ECS, AWS EKS).
Experience with other public cloud platforms like Azure and GCP is a bonus.
Solid professional coding experience in at least one programming language, preferably Java.
Experience with BigData platforms, like AWS EMR, Databricks, Cloudera, Hortonworks etc.
Experience with open source technologies like Hadoop, Hive, Presto, Spark, Airflow etc.