AI Performance Engineer

Apply Now

Company: Parasail

Location: San Francisco, CA 94112

Description:

Parasail is redefining AI infrastructure by enabling seamless deployment across a distributed network of GPUs, optimizing for cost, performance, and flexibility. Our mission is to empower AI developers with a fast, cost-efficient, and scalable cloud experience-free from vendor lock-in and designed for the next generation of AI workloads.

Job Description:

The AI Performance Engineer plays a crucial role in delivering a competitive platform by focusing on efficiently scheduling, executing, and managing AI workloads onto distributed compute systems. This role is deeply technical, spanning from baremetal GPU concepts to distributed AI orchestration. This position is about more than optimization; it's about pioneering efficient infrastructure that supports AI's role in reshaping productivity, transforming industries, and addressing some of the world's most challenging problems. You'll ensure that generative AI, including large language models, diffusion models, and multi-modal models, operates efficiently at an enterprise scale and delivers continuous cost and sustainability improvements.

Responsibilities:
  • Employ sophisticated model distribution strategies for managing AI workloads at enterprise-scale, including inference, training, fine-tuning, and data preprocessing. Develop and implement advanced job scheduling and resource management algorithms to maximize GPU utilization and minimize computational overhead, addressing both computational and memory constraints.
  • Operate closer to the hardware for Generative AI, focusing on building and integrating solutions to significantly boost performance and hardware utilization. Enhance the speed and efficiency of off-the-shelf compiler and distributed AI solutions.
  • Innovate and contribute to our core intellectual property, setting our solutions apart from non-scalable, non-enterprise-grade alternatives by improving and expanding upon popular open-source frameworks for development and serving of AI workflows.

Qualifications:
  • Expertise in GPU computing, including low-level platforms such as CUDA, ROCm, PyTorch backends, TensorFlow XLA, etc. Strength in compiled languages like C++ is key.
  • A production-focused mindset, emphasizing the creation of quality, production-grade code and the use of robust, proven tools and methodologies, complemented by a relentless curiosity and drive to stay at the forefront of AI and GPU computing technologies.
  • A profound passion for learning and exploring the frontier of AI technology, coupled with a strong drive to craft innovative solutions that meet the demands of enterprise-scale challenges.
  • Though not required, demonstrated experience in optimizing AI workloads, with a solid background in performance analysis and optimization of AI/HPC tasks in a production environment is a major plus.

What You Bring to the Table: We are looking for people who are eager to learn and master the lower-level compute concepts that are critical for the AI revolution. With us, your skills will not only contribute to coding but will also have a significant impact on the scalability and efficiency of AI applications at large. If you're geared up for the challenge of optimizing AI performance and eager to push our technological prowess to new heights, we're excited to welcome you aboard.

Similar Jobs