Inference Engineer
Inference Engineer
Join a small, senior team, building the full on-device stack to achieve realtime local intelligence
Join a small, senior team, building the full on-device stack to achieve realtime local intelligence.
Remote / SF / Europe
Full Time
The role
We're looking for engineers who can help us build the software that makes modern llms run efficiently on-device.
You'll primarily work on uzu, our inference engine.
Implementing new model architectures,
Optimizing kernels,
Supporting new modalities,
Adding new backends,
Building a wide range of features such as KV cache paging and continuous batching.
We are the frontier on-device AI lab.
We build the models, inference runtime, and quantization stack. From the device constraint up. So AI can run at full capability on the hardware billions of people already own. Our stack spans from low-level GPU kernels to high-level model conversion tools. We're a small team obsessed with performance, working at the intersection of systems programming and machine learning research.
We encourage you to apply if you deeply understand at least one of:
How computers work.
How modern language models work.
and have experience with at least one of:
Writing high-performance GPU kernels.
Rust systems programming.
Implementing LLM architectures outside of high-level frameworks.
High-quality open source contributions.
We welcome applications from very talented students and early-career engineers.
Why us?
Founded by proven entrepreneurs who built and scaled consumer AI leaders like Reface (300M users) and Prisma (100M MAU).
Our team is small (16 people), senior, and deeply technical. We ship fast and own problems end-to-end.
We’re advised by a former Apple Distinguished Engineer who worked on MLX, and backed by leading AI-focused funds and individuals.
Backed by leading AI
builders and investors:
Interested?
Join a small, senior team, building the full on-device stack to achieve realtime local intelligence
The role
We're looking for engineers who can help us build the software that makes modern llms run efficiently on-device.
You'll primarily work on uzu, our inference engine.
Implementing new model architectures,
Optimizing kernels,
Supporting new modalities,
Adding new backends,
Building a wide range of features such as KV cache paging and continuous batching.
We are the frontier on-device AI lab.
We build the models, inference runtime, and quantization stack. From the device constraint up. So AI can run at full capability on the hardware billions of people already own. Our stack spans from low-level GPU kernels to high-level model conversion tools. We're a small team obsessed with performance, working at the intersection of systems programming and machine learning research.
We encourage you to apply if you deeply understand at least one of:
How computers work.
How modern language models work.
and have experience with at least one of:
Writing high-performance GPU kernels.
Rust systems programming.
Implementing LLM architectures outside of high-level frameworks.
High-quality open source contributions.
We welcome applications from very talented students and early-career engineers.
Why us?
Founded by proven entrepreneurs who built and scaled consumer AI leaders like Reface (300M users) and Prisma (100M MAU).
Our team is small (16 people), senior, and deeply technical. We ship fast and own problems end-to-end.
We’re advised by a former Apple Distinguished Engineer who worked on MLX, and backed by leading AI-focused funds and individuals.
Backed by leading AI
builders and investors:
Interested?
Join a small, senior team, building the full on-device stack to achieve realtime local intelligence.