Mirai Labs: Inference Engineer

Product

Models library

Docs

MacOS app

Careers

Company

1455

Product

Models library

Docs

MacOS app

Careers

Company

1455

Inference Engineer

Join a small, senior team, building the full on-device stack to achieve realtime local intelligence

Join a small, senior team, building the full on-device stack to achieve realtime local intelligence.

Remote / SF / Europe

Full Time

Apply

The role

We're looking for engineers who can help us build the software that makes modern llms run efficiently on-device.

You'll primarily work on uzu, our inference engine.

Implementing new model architectures,
Optimizing kernels,
Supporting new modalities,
Adding new backends,
Building a wide range of features such as KV cache paging and continuous batching.

We are the frontier on-device AI lab.

We build the models, inference runtime, and quantization stack. From the device constraint up. So AI can run at full capability on the hardware billions of people already own. Our stack spans from low-level GPU kernels to high-level model conversion tools. We're a small team obsessed with performance, working at the intersection of systems programming and machine learning research.

We encourage you to apply if you deeply understand at least one of:

How computers work.
How modern language models work.

and have experience with at least one of:

Writing high-performance GPU kernels.
Rust systems programming.
Implementing LLM architectures outside of high-level frameworks.
High-quality open source contributions.

We welcome applications from very talented students and early-career engineers.

Why us?

Founded by proven entrepreneurs who built and scaled consumer AI leaders like Reface (300M users) and Prisma (100M MAU).

Our team is small (16 people), senior, and deeply technical. We ship fast and own problems end-to-end.

We’re advised by a former Apple Distinguished Engineer who worked on MLX, and backed by leading AI-focused funds and individuals.

Backed by leading AI
builders and investors:

Awni Hannun

Anthropic, Apple MLX co-creator

Francois Chaubard

Y Combinator Partner

David Singleton

/dev/agents x-Stripe, Google

Ben Parr

TheoryForge VC, Moltbook

Mati Staniszewski

Co-founder, ElevenLabs

Marcin Żukowski

Co-founder, Snowflake

Marcin Żukowski

Co-founder, Snowflake

Interested?

Join a small, senior team, building the full on-device stack to achieve realtime local intelligence

Apply

The role

We're looking for engineers who can help us build the software that makes modern llms run efficiently on-device.

You'll primarily work on uzu, our inference engine.

Implementing new model architectures,
Optimizing kernels,
Supporting new modalities,
Adding new backends,
Building a wide range of features such as KV cache paging and continuous batching.

We are the frontier on-device AI lab.

We encourage you to apply if you deeply understand at least one of:

How computers work.
How modern language models work.

and have experience with at least one of:

Writing high-performance GPU kernels.
Rust systems programming.
Implementing LLM architectures outside of high-level frameworks.
High-quality open source contributions.

We welcome applications from very talented students and early-career engineers.

Why us?

Founded by proven entrepreneurs who built and scaled consumer AI leaders like Reface (300M users) and Prisma (100M MAU).

Our team is small (16 people), senior, and deeply technical. We ship fast and own problems end-to-end.

We’re advised by a former Apple Distinguished Engineer who worked on MLX, and backed by leading AI-focused funds and individuals.

Backed by leading AI
builders and investors:

Awni Hannun

Anthropic, Apple MLX co-creator

Francois Chaubard

Y Combinator Partner

David Singleton

/dev/agents x-Stripe, Google

Ben Parr

TheoryForge VC, Moltbook

Mati Staniszewski

Co-founder, ElevenLabs

Marcin Żukowski

Co-founder, Snowflake

Marcin Żukowski

Co-founder, Snowflake

Interested?

Join a small, senior team, building the full on-device stack to achieve realtime local intelligence.

Apply

Main

Company

Links

Platform / SDK

Inference Runtime

Models Conversion

Models Library

MacOS App

Blog

Docs

About us

Careers

X (Twitter)

Github

Discord

Platform / SDK

Inference Runtime

Models Conversion

Models Library

MacOS App

Blog

Docs

Main

About us

Careers

Company

X (Twitter)

Github

Discord

Links

Inference Engineer

The role

We are the frontier on-device AI lab.

We encourage you to apply if you deeply understand at least one of:

Why us?

Backed by leading AIbuilders and investors:

Awni Hannun

Francois Chaubard

David Singleton

Ben Parr

Mati Staniszewski

Marcin Żukowski

Marcin Żukowski

The role

We are the frontier on-device AI lab.

We encourage you to apply if you deeply understand at least one of:

Why us?

Backed by leading AIbuilders and investors:

Awni Hannun

Francois Chaubard

David Singleton

Ben Parr

Mati Staniszewski

Marcin Żukowski

Marcin Żukowski

Backed by leading AI
builders and investors:

Backed by leading AI
builders and investors: