Mirai Labs: Inference Engineer

Product

Models library

Docs

MacOS app

Careers

Company

1455

Product

Models library

Docs

MacOS app

Careers

Company

1455

Inference Engineer

Join a small, senior team, building the full on-device stack to achieve realtime local intelligence

Join a small, senior team building the fastest on-device AI inference engine. Powering real products, not demos.

Remote / SF / Europe

Full Time

Apply

The role

We're looking for engineers who can help us build the software that makes modern llms run efficiently on-device.

You'll primarily work on uzu, our inference engine.

Implementing new model architectures,
Optimizing kernels,
Supporting new modalities,
Adding new backends,
Building a wide range of features such as KV cache paging and continuous batching.

We are the frontier on-device AI lab.

We build the models, inference runtime, and quantization stack. From the device constraint up. So AI can run at full capability on the hardware billions of people already own. Our stack spans from low-level GPU kernels to high-level model conversion tools.

We're a small team obsessed with performance, working at the intersection of systems programming and machine learning research.

We encourage you to apply if you deeply understand at least one of:

How computers work
How modern language models work

and have experience with at least one of:

Writing high-performance GPU kernels
Rust systems programming
Implementing LLM architectures outside of high-level frameworks
High-quality open source contributions

We welcome applications from very talented students and early-career engineers.

Why us?

Founded by proven entrepreneurs who built and scaled consumer AI leaders like Reface (300M users) and Prisma (100M MAU).

Our team is small (16 people), senior, and deeply technical. We ship fast and own problems end-to-end.

We’re advised by a former Apple Distinguished Engineer who worked on MLX, and backed by leading AI-focused funds and individuals.

Backed by leading AI
builders and investors:

Awni Hannun

Anthropic, Apple MLX co-creator

Francois Chaubard

Y Combinator Partner

David Singleton

/dev/agents x-Stripe, Google

Ben Parr

TheoryForge VC, Moltbook

Mati Staniszewski

Co-founder, ElevenLabs

Gokul Rajaram

Google, Coinbase, Trade Desk

Marcin Żukowski

Co-founder, Snowflake

Interested?

Join a small, senior team, building the full on-device stack to achieve realtime local intelligence

Apply

The role

We're looking for engineers who can help us build the software that makes modern llms run efficiently on-device.

You'll primarily work on uzu, our inference engine.

Implementing new model architectures,
Optimizing kernels,
Supporting new modalities,
Adding new backends,
Building a wide range of features such as KV cache paging and continuous batching.

We are the frontier on-device AI lab.

We're a small team obsessed with performance, working at the intersection of systems programming and machine learning research.

We encourage you to apply if you deeply understand at least one of:

How computers work
How modern language models work

and have experience with at least one of:

Writing high-performance GPU kernels
Rust systems programming
Implementing LLM architectures outside of high-level frameworks
High-quality open source contributions

We welcome applications from very talented students and early-career engineers.

Why us?

Founded by proven entrepreneurs who built and scaled consumer AI leaders like Reface (300M users) and Prisma (100M MAU).

Our team is small (16 people), senior, and deeply technical. We ship fast and own problems end-to-end.

We’re advised by a former Apple Distinguished Engineer who worked on MLX, and backed by leading AI-focused funds and individuals.

Backed by leading AI
builders and investors:

Awni Hannun

Anthropic, Apple MLX co-creator

Francois Chaubard

Y Combinator Partner

David Singleton

/dev/agents x-Stripe, Google

Ben Parr

TheoryForge VC, Moltbook

Mati Staniszewski

Co-founder, ElevenLabs

Gokul Rajaram

Google, Coinbase, Trade Desk

Marcin Żukowski

Co-founder, Snowflake

Interested?

Join a small, senior team building the fastest on-device AI inference engine. Powering real products, not demos.

Apply

Main

Company

Links

Platform / SDK

Inference Runtime

Models Conversion

Models Library

MacOS App

Blog

Docs

About us

Careers

X (Twitter)

Github

Discord

Platform / SDK

Inference Runtime

Models Conversion

Models Library

MacOS App

Blog

Docs

Main

About us

Careers

Company

X (Twitter)

Github

Discord

Links

Inference Engineer

The role

We are the frontier on-device AI lab.

We encourage you to apply if you deeply understand at least one of:

Why us?

Backed by leading AIbuilders and investors:

Awni Hannun

Francois Chaubard

David Singleton

Ben Parr

Mati Staniszewski

Gokul Rajaram

Marcin Żukowski

The role

We are the frontier on-device AI lab.

We encourage you to apply if you deeply understand at least one of:

Why us?

Backed by leading AIbuilders and investors:

Awni Hannun

Francois Chaubard

David Singleton

Ben Parr

Mati Staniszewski

Gokul Rajaram

Marcin Żukowski

Backed by leading AI
builders and investors:

Backed by leading AI
builders and investors: