REMOTE AI Systems Engineer - HPC

Apply Now

Company: CyberCoders

Location: Seattle, WA 98115

Description:

Title: Systems Engineer (Senior or Mid-level)
Location: FULLY remote!
Salary: $175k-$300k DOE + RSUs
Requirements: 3+ years of Systems Engineering, DevOps, or AI/ML Infrastructure, AI/ML, and HPC

Our global platform focuses on Bitcoin mining, Ethereum staking, and AI/High Performance Computing Infrastructure. Founded in 2017, we are now post-IPO, publicly traded, and experiencing MASSIVE GROWTH... Our revenue grew over 95% in the last YEAR! We have made several acquisitions and formed several partnerships during this period of growth, and we need YOUR HELP to keep it going.

We're currently looking to hire Systems Engineers to design, build, & optimize the infrastructure that powers AI-driven applications. You will work at the intersection of hardware, software, and data - enabling efficient deployment of AI models and solutions at scale.

The ideal candidates have a strong background in systems engineering, high-performance computing, software defined networking, and general software development, with some experience in machine learning and deploying/maintaining AI systems in production.

What You'll be Doing
  • AI Infrastructure Design and Development:
    • Design and implement scalable AI/ML infrastructure.
    • Optimize AI pipelines for performance and reliability.
    • Integrate AI models using CI/CD best practices.
  • Model Deployment and Optimization:
    • Deploy AI models in various environments (cloud, edge, on-premises).
    • Optimize inference performance for latency, throughput, and energy efficiency.
    • Use tools like TensorRT and ONNX to accelerate models.
  • Systems Engineering:
    • Maintain HPC clusters, GPUs, and distributed systems.
    • Develop tools for system monitoring and troubleshooting.
    • Ensure AI system reliability through proactive maintenance.
  • Collaboration and Cross-Functional Work:
    • Align AI systems with overall product architecture.
    • Support AI researchers with efficient data pipelines and computing environments.
  • Security and Compliance:
    • Ensure compliance with security standards and data privacy regulations.
    • Secure sensitive data and models in production.
  • Emerging Technology Integration:
    • Stay updated with AI and machine learning advancements.
    • Integrate new tools and methods to enhance systems.

What You Need for this Position
  • 5+ years of Systems Engineering, DevOps, or AI/ML Infrastructure experience
  • 2+ years of experience in High Performance Computing
  • Experience building cloud computing platforms (from scratch is a huge plus)
  • Hands-on experience with AI frameworks (TensorFlow, PyTorch, etc.)
  • Experience deploying AI/ML models in production environments
  • Strong knowledge of distributed systems & HPC for AI workloads
  • Experience with containerization & orchestration tools (Docker, Kubernetes, etc.)
  • Programming skills in Golang, Python, or Rust
  • Familiarity with AI platform tools (Mosaic AI, Run.AI, SageMaker, Vetex AI, etc.)
  • Familiarity with Infiniband and/or RoCEv2 networking, & NCCL
  • Proficiency in using AI hardware accelerators (GPUs, TPUs, etc)
  • BS in Computer Science, Engineering, or related (Master's degree is a huge plus)

What's In It for You
  • $175k - $300k/year DOE
  • RSU's
  • 5 weeks PTO
  • 401k w/ match
  • Comprehensive Benefit Plan

Similar Jobs