Mirai labs: Intelligence that lives on the device.

Research

Models library

Inference

Products

Docs

Company

1455

on device ai research lab is building ...

Intelligence that lives on the device

The model, the inference stack, and the hardware abstractions. Full-stack sovereignty over on-device AI for Apple Silicon.

Read our research

View our models

Talk to us

New Chat

Chats

Models

Benchmarks

Docs

On-device inference

Settings

On-device InferencePowered by Mirai • Offline

How does Mirai run models on-device without cloud?

Thought for 1s

Evaluating memory-bound inference constraints on Apple Silicon. The key bottleneck is arithmetic intensity — operations per byte of memory traffic — not raw compute.

Ran Load hardware constraint profile for Apple M-series

Thought briefly

Ran Benchmark W4A8 quantization on Neural Engine

Thought briefly

Ran Initialize block diffusion decoder pipeline

Generating response...

Ran entirely on-device · Mirai-1.1-0.6B · Offline · Private · 0 bytes to cloud

Mirai-1.1-0.6B

Eject modelMirai-1.1-0.6B

We own the full stack of on-device AI

Model Intelligence.

Block diffusion

Speculative routing

Per-layer n-gram embeddings

Architectures built for memory-bound decoding

Inference Engine.

MPSGraph kernels

W8A8 + vector-quantised weights

ASTC zero-overhead loading

Metal-native execution, no CoreML overhead

Agentic Infrastructure.

On-device execution of AI actions

Local context, memory, and state

Deterministic, low-latency decision loops

Works offline, syncs with cloud when needed

We publish on the hardest problems in on-device inference

Preprint

Speculative routing for Block-MoE inference.

Predict expert activation from prior-block states.

Preprint

W8A8+VQ hybrid: near-lossless 2–4 bit compression.

SpinQuant-style rotation + vector quantiser for GEMM kernels.

Blog post

ASTC codec for zero-copy weight loading.

Hardware texture decompression repurposed for neural weights

Blog post

Block diffusion on Apple Neural Engine.

Aligning block size to ANE width for max throughput.

Active research areas

Block diffusion

Self-speculation

Speculative routing

Block-MoE

SpinQuant + VQ

ASTC kernels

Layer repetition

Explore research papers

Explore blog

We built our models. Trained for on-device deployment from the ground up

Block-Diffusion

Mirai-1B

Block size aligned to M-series ANE width. Autoregressive conversion, no full retraining.

Sparse

Mirai-3B-MoE

Block-sparse experts with speculative routing. Disk offload with prefetch overlap.

N-gram

Mirai-Embed

Per-layer Engram-style embeddings. Reduced vocabulary × richer context.

Not scaled-down cloud models. Architectures designed around memory bandwidth and ANE throughput.

We support the most popular architectures

Partner

LFM

LiquidAI

GPT-OSS

OpenAI

Qwen 3

Alibaba

Gemma-3

Google

Llama-3.2

SmolLM2

Hugging Face

DeepSeek-R1

DeepSeek

Llamba

Cartesia

Your model can be next

Talk to us

Explore all models

Our inference stack

Inference Engine

Rust-native

MPSGraph for ANE access without CoreML overhead

Targets W8A8 regime where Apple's neural accelerators peak

W8A8

peak ANE regime

2–4 bit

VQ compressed weights

Learn more

Quantisation Research

W8A8 → int8 storage → 2–4 bit vector quantisation

SpinQuant-style rotation makes W8A8 nearly lossless

ASTC codec investigation for zero-overhead dequant

Arithmetic intensity target

I = FLOPs / bytes_DRAM
↑ block size → ↑ I during diffusion
↑ VQ ratio → ↓ bytes_DRAM

Learn more

Why on-device AI needs its own lab?

Full-stack sovereignty.

0 architectural choices imposed by third-party frameworks. Tailoring model to the hardware.

1.5B Apple Silicon users

The largest homogeneous compute substrate in history, underutilised for AI inference.

Privacy by architecture.

On-device inference means data never leaves the user's device.

Many innovative model architectures fail to gain adoption because inference stacks can't support them.
We exist to close that gap: own the model, own the stack, own the hardware abstractions.

Want to work on unsolved problems in on-device AI?

Open roles:

Careers page

Machine Learning Engineer

Remote / SF / Europe • Full Time • Models Optimization

Machine Learning Engineer

Remote / SF / Europe • Full Time • Models & Research

Inference engineer

Remote / SF / Europe • Full Time

Quantisation, speculative decoding, novel architectures. Small team, high ownership.

on device ai research lab is building ...

Intelligence for the edge.

Read our research

View our models

terminal — mirai

Speak with us

Research

Products

Company

Links

Papers

Benchmarks

Blog

Models Library

Inference SDK

MacOS App

Docs

About us

Careers

X (Twitter)

Github

Discord