Research

Models library

Inference

Products

Docs

Company

on device ai research lab is building ...

Intelligence that lives on the device

The model, the inference stack, and the hardware abstractions. Full-stack sovereignty over on-device AI for Apple Silicon.

Read our research

View our models

Talk to us

New Chat
Chats
Models
Benchmarks
Docs
On-device inference
Settings
On-device InferencePowered by Mirai • Offline
How does Mirai run models on-device without cloud?
Thought for 1s

Evaluating memory-bound inference constraints on Apple Silicon. The key bottleneck is arithmetic intensity — operations per byte of memory traffic — not raw compute.

Ran Load hardware constraint profile for Apple M-series
Thought briefly
Ran Benchmark W4A8 quantization on Neural Engine
Thought briefly
Ran Initialize block diffusion decoder pipeline
Generating response...
Ran entirely on-device · Mirai-1.1-0.6B · Offline · Private · 0 bytes to cloud
Mirai-1.1-0.6B
Eject modelMirai-1.1-0.6B

We own the full stack of on-device AI

Model Intelligence.

Block diffusion

Speculative routing

Per-layer n-gram embeddings

Architectures built for memory-bound decoding

Inference Engine.

MPSGraph kernels

W8A8 + vector-quantised weights

ASTC zero-overhead loading

Metal-native execution, no CoreML overhead

Agentic Infrastructure.

On-device execution of AI actions

Local context, memory, and state

Deterministic, low-latency decision loops

Works offline, syncs with cloud when needed

We publish on the hardest problems in on-device inference

Preprint

Speculative routing for Block-MoE inference.

Predict expert activation from prior-block states.

Preprint

W8A8+VQ hybrid: near-lossless 2–4 bit compression.

SpinQuant-style rotation + vector quantiser for GEMM kernels.

Blog post

ASTC codec for zero-copy weight loading.

Hardware texture decompression repurposed for neural weights

Blog post

Block diffusion on Apple Neural Engine.

Aligning block size to ANE width for max throughput.

Active research areas

Block diffusion

Self-speculation

Speculative routing

Block-MoE

SpinQuant + VQ

ASTC kernels

Layer repetition

We built our models. Trained for on-device deployment from the ground up

Block-Diffusion

Mirai-1B

Block size aligned to M-series ANE width. Autoregressive conversion, no full retraining.

Sparse

Mirai-3B-MoE

Block-sparse experts with speculative routing. Disk offload with prefetch overlap.

N-gram

Mirai-Embed

Per-layer Engram-style embeddings. Reduced vocabulary × richer context.

Not scaled-down cloud models. Architectures designed around memory bandwidth and ANE throughput.

Our inference stack

Inference Engine

Rust-native

MPSGraph for ANE access without CoreML overhead

Targets W8A8 regime where Apple's neural accelerators peak

W8A8

peak ANE regime

2–4 bit

VQ compressed weights

Quantisation Research

W8A8 int8 storage 2–4 bit vector quantisation

SpinQuant-style rotation makes W8A8 nearly lossless

ASTC codec investigation for zero-overhead dequant

Arithmetic intensity target

I = FLOPs / bytes_DRAM
block size I during diffusion
VQ ratio bytes_DRAM

Why on-device AI needs its own lab?

Full-stack sovereignty.

0 architectural choices imposed by third-party frameworks. Tailoring model to the hardware.

1.5B Apple Silicon users

The largest homogeneous compute substrate in history, underutilised for AI inference.

Privacy by architecture.

On-device inference means data never leaves the user's device.

Many innovative model architectures fail to gain adoption because inference stacks can't support them.
We exist to close that gap: own the model, own the stack, own the hardware abstractions.

Want to work on unsolved problems in on-device AI?

Open roles:

Machine Learning Engineer

Remote / SF / Europe Full Time Models Optimization

Machine Learning Engineer

Remote / SF / Europe Full Time Models & Research

Inference engineer

Remote / SF / Europe Full Time

Quantisation, speculative decoding, novel architectures. Small team, high ownership.

on device ai research lab is building ...

Intelligence for the edge.

Read our research

View our models

terminal — mirai

Speak with us