Models, runtime & infrastructure to make on-device AI interactive, ambient & continuous.

We believe the next generation of AI
products will be built around continuous
interaction without isolated prompts & chat
templates. That only becomes possible
when latency disappears.

The last decade shipped one app per service.

Every task earned its own icon. Bank. Calendar. Files. Messaging. We trained ourselves to context-switch between dozens of interfaces just to get one thing done.

Est. 8.9M mobile apps · Apple App Store + Google Play, 2026

The next decade collapses the app grid into one interface.

One instruction. Multiple services. Zero app launches.
The assistant resolves everything in a single
interaction, privately, on your device.

Chase · Calendar · Files · Uber · resolved in one interaction

Execution speed we are aiming for ...

0 t/s is where models become interfaces.

1,000 t/s
slow chatthinking...
AI-native interfacesrenders almost instantly
670 t/s thinking...

User experience breaks
when the model is slow.

AI interfaces nowadays are just chatboxes.
You enter a request and wait. People don’t like
spending more than 200ms in a loading state.

The user sees 1 response. The
model generates 100+ internally.

AI models perform many internal steps:
parsing, validation, tool calls, reasoning,
and more which takes time.

Your AI should be fast enough
that you don’t notice it at all.

Communication gets better when people receive and produce information at the same time.

mirai — airplane mode · M4 Pro
$

1,000 t/s requires a different execution stack.

Interactive AI isn’t one response. It’s a continuous loop: parsing, validating, ranking, rendering. Each step has a latency budget. Miss one, the experience breaks.

To solve that, we are building every layer from scratch:

From the device constraint up.
A runtime that executes batch size = 1.
A quantization scheme co-designed with architecture.

Why can't we just run a cloud model on a device?

On-device is not a
smaller cloud. It’s a
different system entirely.

Cloud

Large batches
Throughput-first
Memory-rich
Compute-saturating

On-device

Batch size = 1
Latency-first
Memory-constrained
Bandwidth-sensitive

Performance emerges from co‑design across the stack.

On a device, every request is processed alone. No batching, no parallelism. Mirai co‑designs models and runtime specifically for this constraint, on Apple Silicon.

  • Minimizing memory footprint.
  • Maximizing arithmetic intensity.
  • Maximizing neural accelerator utilization.
We are building ...

The full on-device stack
to achieve realtime
local intelligence.

1
Local models
Architectures designed for memory-bound execution and realtime decoding.
2
Inference engine
Hardware aware optimization of tensor multiplications & other operations.
3
Quantization
Compress models without collapsing
interaction quality or latency.
4
Application layer
We start from your Apple device automation.

On-device AI deserves its own frontier lab

The silicon is shipped. 1.4 billion devices are waiting. The software is the open problem. And it’s ours to solve.