Member of Technical Staff - Edge AI Inference Engineer

Apply Now

Company: Liquid AI

Location: San Francisco, CA 94112

Description:

Liquid AI, an MIT spin-off, is a foundation model company headquartered in Boston, Massachusetts. Our mission is to build capable and efficient general-purpose AI systems at every scale.

Our goal at Liquid is to build the most capable AI systems to solve problems at every scale, such that users can build, access, and control their AI solutions. This is to ensure that AI will get meaningfully, reliably and efficiently integrated at all enterprises. Long term, Liquid will create and deploy frontier-AI-powered solutions that are available to everyone.

What this role actually is:

As we prepare to deploy our models across various edge device types, including CPUs, embedded GPUs, and NPUs, we seek an expert to optimize inference stacks tailored to each platform. We're looking for someone who can take our models, dive deep into the task, and return with a highly optimized inference stack, leveraging existing frameworks like llama.cpp, Executorch, and TensorRT to deliver exceptional throughput and low latency.

The ideal candidate is a highly skilled engineer with extensive experience in inference on embedded hardware and a deep understanding of CPU, NPU, and GPU architectures. They should be self-motivated, capable of working independently, and driven by a passion for optimizing performance across diverse edge hardware platforms.

Proficiency in building and enhancing edge inference stacks is essential. Additionally, experience with mobile development and expertise in cache-aware algorithms will be highly valued.

Responsibilities
    • Strong ML Experience: Proficiency in Python and PyTorch to effectively interface with the ML team at a deeply technical level.
    • Hardware Awareness: Must understand modern hardware architecture, including cache hierarchies and memory access patterns, and their impact on performance.
    • Proficient in Coding: Expertise in Python, C++, or Rust for AI-driven real-time embedded systems
    • Optimization of Low-Level Primitives: Responsible for optimizing core primitives to ensure efficient model execution.
    • Self-Guided and Ownership: Ability to independently take a PyTorch model and inference requirements and deliver a fully optimized edge inference stack with minimal guidance

Similar Jobs