Staff Product Manager, ML Platform
Apply NowCompany: Cerebras Systems
Location: Sunnyvale, CA 94086
Description:
Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.
Cerebras' current customers include global corporations across multiple industries, national labs, and top-tier healthcare systems. In January, we announced a multi-year, multi-million-dollar partnership with Mayo Clinic, underscoring our commitment to transforming AI applications across various fields. In August, we launched Cerebras Inference, the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services.
About The RoleIn this role, you will design and own the primary interface that ML researchers and scientists use to train massive LLMs (1+ trillion parameters) on Cerebras chips. Speed of iteration is critical for accelerating breakthroughs, and youll be at the helm of ensuring we systematically reduce the time-to-insight for ML research across diverse domains.
You will work with a deeply technical product team to drive the vision and strategy for Cerebras ML training ecosystem, including CSTorch (our PyTorch-equivalent framework) and Model Zoo (a library of high-level abstractions and domain-specific tools for training and fine-tuning LLMs). You will lead the creation of a seamless platform that enables researchers to preprocess data, pre-train, fine-tune, and evaluate models effortlessly on Cerebras hardware. By building intuitive workflows, extensible tools, and integrated libraries, youll empower both cutting-edge ML research and domain-specific innovation.
As the Cerebras ML Platform PM, youll play a pivotal role in advancing AI across industries, working with the most cutting-edge training techniques and collaborating with a world-class research and engineering team.
Responsibilities- Develop and provide a deep intuition for the ML researcher training workflow, including data preprocessing, training, fine-tuning, and evaluations.
- Define and execute the product roadmap for Model Zoo (our ML training library) and CSTorch (our ML framework), ensuring they form a flexible, beautifully designed, and extensible platform.
- Collaborate with our internal AppliedML team, as well as external ML researchers and domain scientists to design features that dramatically reduce time-to-insight and accelerate breakthroughs.
- Drive cross-functional collaboration to align product roadmaps and execute priorities across frameworks and libraries.
- Be the voice of the user! Define relevant success metrics and continuously incorporate both feedback and emerging trends in ML to refine CSTorch and ModelZoo, maintaining leadership in the space.
- Work across Product, Engineering, and business leadership to help define our product go-to-market approach to maximize value to users and expand our user community over time.
- Communicate roadmaps, priorities, experiments, and decisions clearly across a wide spectrum of audiences from internal customers to executives.
- Bachelors or Masters degree in computer science, electrical engineering, physics, mathematics, a related scientific/engineering discipline, or equivalent practical experience.
- 4-6+ years of product management experience in developer tools, ML frameworks, or software platforms.
- Strong understanding of typical training and fine-tuning workflows, including model development, iterative experimentation, and debugging.
- Familiarity with machine learning/deep learning concepts and techniques for training modern models.
- Proven ability to collaborate across engineering, research, and user-facing teams to deliver impactful solutions.
- Experience working with a data science/ML stack, including TensorFlow and PyTorch.
- Experience developing machine learning applications or building tools for machine learning applicationdevelopers.
- An entrepreneurial sense of ownership of overall team and product success, and the ability to make things happen around you. A bias towards getting things done, owning the solution, and driving problems to resolution.
- Outstanding presentation skills with a strong command of verbal and written communication.