Inference Engineer
Inference Engineer
Join a small, senior team, building the full on-device stack to achieve realtime local intelligence
Join a small, senior team building the fastest on-device AI inference engine. Powering real products, not demos.
Remote / SF / Europe
Full Time
The role
We're looking for engineers who can help us build the software that makes modern llms run efficiently on-device.
You'll primarily work on uzu, our inference engine.
Implementing new model architectures,
Optimizing kernels,
Supporting new modalities,
Adding new backends,
Building a wide range of features such as KV cache paging and continuous batching.
We are the frontier on-device AI lab.
We build the models, inference runtime, and quantization stack. From the device constraint up. So AI can run at full capability on the hardware billions of people already own. Our stack spans from low-level GPU kernels to high-level model conversion tools.
We're a small team obsessed with performance, working at the intersection of systems programming and machine learning research.
We encourage you to apply if you deeply understand at least one of:
How computers work
How modern language models work
and have experience with at least one of:
Writing high-performance GPU kernels
Rust systems programming
Implementing LLM architectures outside of high-level frameworks
High-quality open source contributions
We welcome applications from very talented students and early-career engineers.
Why us?
Founded by proven entrepreneurs who built and scaled consumer AI leaders like Reface (300M users) and Prisma (100M MAU).
Our team is small (16 people), senior, and deeply technical. We ship fast and own problems end-to-end.
We’re advised by a former Apple Distinguished Engineer who worked on MLX, and backed by leading AI-focused funds and individuals.
Backed by leading AI
builders and investors:
Interested?
Join a small, senior team, building the full on-device stack to achieve realtime local intelligence
The role
We're looking for engineers who can help us build the software that makes modern llms run efficiently on-device.
You'll primarily work on uzu, our inference engine.
Implementing new model architectures,
Optimizing kernels,
Supporting new modalities,
Adding new backends,
Building a wide range of features such as KV cache paging and continuous batching.
We are the frontier on-device AI lab.
We build the models, inference runtime, and quantization stack. From the device constraint up. So AI can run at full capability on the hardware billions of people already own. Our stack spans from low-level GPU kernels to high-level model conversion tools.
We're a small team obsessed with performance, working at the intersection of systems programming and machine learning research.
We encourage you to apply if you deeply understand at least one of:
How computers work
How modern language models work
and have experience with at least one of:
Writing high-performance GPU kernels
Rust systems programming
Implementing LLM architectures outside of high-level frameworks
High-quality open source contributions
We welcome applications from very talented students and early-career engineers.
Why us?
Founded by proven entrepreneurs who built and scaled consumer AI leaders like Reface (300M users) and Prisma (100M MAU).
Our team is small (16 people), senior, and deeply technical. We ship fast and own problems end-to-end.
We’re advised by a former Apple Distinguished Engineer who worked on MLX, and backed by leading AI-focused funds and individuals.
Backed by leading AI
builders and investors:
Interested?
Join a small, senior team building the fastest on-device AI inference engine. Powering real products, not demos.