DevOps Engineer (Full-time)

Apply Now

Company: Strong Compute

Location: San Francisco, CA 94112

Description:

AI workloads are brutal-petabytes of data, distributed jobs, and real-time GPU orchestration. We're building an AI-first DevOps infrastructure that makes compute reliable, scalable, and cost-effective. If you love infrastructure automation, cloud-native engineering, and AI performance tuning, you'll love this role.

What you'll do

Design and manage scalable, fault-tolerant AI compute infrastructure

Automate GPU provisioning, multi-cloud scheduling, and scaling strategies

Improve observability, logging, and monitoring for real-time AI workloads

Optimize containerized deployments for Kubernetes, Nomad, or Slurm

Enhance security, CI/CD, and cloud networking for high-performance distributed trainingImplement security best practices for DevOps pipelines, including secrets management, infrastructure security, and compliance automation

Reduce infrastructure cost and maximize performance through automation and tuning

What we're looking for

Deep knowledge of CI/CD pipelines and infrastructure as code

Hands-on experience with monitoring and logging tools (Prometheus, Grafana, OpenTelemetry)

Proficiency in shell scripting, Python, or Go for automation

Experience with security best practices for cloud environments, including IAM, container security, and incident response

Nice to haves:

Experience managing large-scale clusters with Kubernetes or other approaches and cloud infrastructure

Experience withTerraform, Ansible, Helm, or Pulumi

Understanding of AI/ML compute environments (GPUs, CUDA, NCCL, Slurm, Horovod)

Our culture

We move fast. We ship weekly-new features, improvements, and fixes go live fast.

We test big. Every month, we stress test with large groups of users face to face, get real-world feedback, and iterate rapidly. We build together. On site only, in SF or Sydney.

We iterate relentlessly. Direct user feedback shapes our roadmap-we release, test, refine, and keep moving.

We travel when needed. Engineers may travel between SF and Sydney to run events and meet with clients.

Location: SF or Sydney (OG startup house vibe, great food, late nights, all the GPUs)

Equipment & Benefits:

Top spec Macbook + separate GPU cluster dev environments for each engineer.

Weekly cash bonus when you workout out 3+ times a week.

Comprehensive health benefits, including a choice of Kaiser, Aetna OAMC, and HDHP (HSA-eligible) plans for our SF-based team members.

Highest in the world 20 year exercise window for options

Don't have all the skills? Apply anyway! We're looking for people who move fast, learn fast, and ship fast. If that's you, let's talk.

Want to get to know us first? Attend one of our upcoming events.

Similar Jobs