About us

We’re a small, senior team building the full on-device stack to achieve realtime local intelligence

About us

We’re a small, senior team building the full on-device stack to achieve realtime local intelligence.

Who we are?

We are the frontier on-device AI lab. We build the models, inference runtime, and quantization stack. From the device constraint up. So AI can run at full capability on the hardware billions of people already own.

What we believe?

The trinity: model, inference stack, hardware.

Companies that focus on a single component of this trinity lack sovereignty. They're constrained by the architectural choices made by others. Mirai Labs owns all three.

Most labs treat on-device models as scaled-down versions of their cloud-focused cousins. But LLM architectures that evolved for the cloud are not well-suited to on-device setups. Cloud models operate in the arithmetic-bound regime. They treat memory as an unlimited resource and optimize for throughput across a batch.

On a device, memory is the main bottleneck. The architecture itself has to be different.

What we are building?

  1. Models designed for memory-bound environments.

    We focus on three core objectives: increasing arithmetic intensity at the decoding stage, reducing the size of the resident set, and maximally utilizing GPU neural accelerators. This leads to models that differ from standard transformers in meaningful structural ways.

  2. A sovereign inference engine.

    Full control of the inference stack is what gives us a unique advantage. We have the freedom to tailor the model to the hardware. And the hardware assumptions into the model. No generic runtime can do this.

  3. Quantization co-designed with the architecture.

What makes Mirai different?

Mirai isn't a wrapper around existing inference stacks. It's built from scratch for on-device execution.

Most inference engines start with cross-platform abstractions and work downward to fit mobile hardware. That makes portability easy. But you pay for it in performance, memory efficiency, and reliability.

We went the other way. When other teams start with generic runtimes. Apply quantization after. Adapt cloud architectures to fit a device. Optimize for benchmarks in isolation.

We start from the hardware constraint. We co-design model, runtime, and quantization. Build for the environment where the code actually runs. Designed for Apple Silicon specifically.

Who we are?

Mirai is founded by proven entrepreneurs who built and scaled consumer AI leaders like Reface (300M users, backed by Andreessen Horowitz) and Prisma (100M MAU).

Our team is small (16 people), senior, and deeply technical. We ship fast and own problems end-to-end.

We’re advised by a former Apple Distinguished Engineer who worked on MLX, and backed by leading AI-focused funds and individuals.

Our Vision

The next decade doesn't look like a thousand apps.

It looks like one assistant that talks to everything. Your calendar, your bank, your files, your services. No app to open. No form to fill. Just intent and outcome.

That only works if the model is fast enough to be the interface itself. At 1,000 tokens per second on-device, the assistant stops returning text and starts rendering outcomes. Instantly, privately, without a network.

For AI to work seamlessly everywhere, the core must live on device.

The next generation of software won't be built on apps. It will be built on a new system layer. Not just models, not just runtimes, but a tightly integrated stack that makes intelligence native to the device. Mirai is building that layer.

Backed by leading AI
builders and investors: