Mirai labs: Intelligence that lives on the device.

Research

Models library

Inference

Products

Docs

Company

On-device AI research & products that make the assistant the only interface you need.

We build the models, runtime, and infrastructure to make on-device AI fast, capable, and accessible. On the hardware billions of people already own.

Read our research

Talk to us

Listening...

Processing on-device...

Searched 4 providers on-device

Hertz

$78 · RAV4

Enterprise

$84 · Escape

Avis

$89 · Equinox

Booked Hertz · $78/day

0 apps · 0 network calls · airplane mode

Composing on-device...

Draft ready · on-device

To: Sarah

Drafted in 240 ms

0 apps · 0 network calls · airplane mode

The last decade shipped one app per service.

Every task earned its own icon. Rideshare. Bank. Calendar. Files. Messaging. Photos. We trained ourselves to context-switch between dozens of interfaces just to get one thing done.

Est. 8.9M mobile apps · Apple App Store + Google Play, 2026

The next decade collapses the app grid into one interface.

One instruction. Five services. Zero app launches. The assistant understands intent. Not commands. And resolves everything in a single interaction, privately, on your device.

Chase · Calendar · Files · Uber · resolved in one interaction

thinking ...

AI on a device needs to be fast enough that it feels less like software and more like thinking

mirai · predicting next token

127 ms · on-device

Book a rental from

0.55

LAX,

0.18

JFK,

0.13

Hertz

0.08

the

0.06

any

top-1 prob · 0.55  ·  generated on-device

You say what you need.
It's done. No app to open, no form to fill, no network required.

This is what the next decade looks like. Running today, on real hardware, offline. The AI capable of collapsing the app grid into one interface already exists. What Mirai is doing is making sure it runs on the device in your pocket, not just in a lab.

Technical details

mirai — airplane mode · M4 Pro

$

1,000 t/s

Above this, the assistant stops returning text. It renders interfaces. Speed is what turns tokens into a UI

1,000 t/s

slow chatthinking...

instant UIrenders now

What we are building to achive 1,000 t/s on device

1 Local models

Architecture designed from device memory constraints — not compressed cloud models. Maximizes arithmetic intensity at decoding, minimizes resident set.

2 Inference engine

Built from scratch for device silicon — Metal kernels, hardware-native, not portable. The model and runtime co-evolve. No generic runtime can do this.

3 Quantization

W8A8+VQ co-designed with architecture and kernels. Hardware-accelerated weight loading. Without it, the model doesn't fit. With it, nothing is sacrificed.

4 The experience

Tokens become interfaces, forms, and resolved workflows. This is the product. Everything above is what makes it possible.

–

Silicon

Apple M-series. Already shipped — 1.4B devices worldwide. The silicon is done. The software is the open problem.

Performance vs MLX · LFM2-1.2B · M1 Ultra

Mirai runtimeMLX

MIRAIMLX

Prompt Speed

tokens/sec · higher →

MIRAIMLX

Generate Speed

tokens/sec · higher →

MIRAIMLX

First Token

latency · lower →

Technical details

How we build it

Research

Products

Company

Links

Papers

Benchmarks

Blog

Models Library

Inference SDK

MacOS App

Docs

About us

Careers

X (Twitter)

Github

Discord

The last decade shipped one app per service.

The next decade collapses the app grid into one interface.

AI on a device needs to be fast enough that it feels less like software and more like thinking

You say what you need. It's done. No app to open, no form to fill, no network required.

1,000 t/s

What we are building to achive 1,000 t/s on device

1

Local models

2

Inference engine

3

Quantization

4

The experience

–

Silicon

You say what you need.
It's done. No app to open, no form to fill, no network required.