Platform

Models Library

Pricing

Docs

Company

...

Get Started

Platform

Models Library

Pricing

Docs

Company

Get Started

Apple Inference SDK for iOS & Mac

Run high-performance inference locally

Mirai unlocks the full power of Apple’s GPU and ANE, delivering best-in-class inference speed for AI models

Get Started for Free

Talk to Engineer

Run most popular small LLMs

First 10,000 devices for free

Optimised for Apple Silicon

Apple Inference SDK for iOS & Mac

Run high-performance inference locally

Mirai unlocks the full power of Apple’s GPU and ANE, delivering best-in-class inference speed for AI models

Get Started for Free

Talk to Engineer

Run most popular small LLMs

First 10,000 devices for free

Optimised for Apple Silicon

Apple Inference SDK for iOS & Mac

Run high-performance inference locally

Mirai unlocks the full power of Apple’s GPU and ANE, delivering best-in-class inference speed for AI models

Get Started for Free

Talk to Engineer

Run most popular small LLMs

First 10,000 devices for free

Optimised for Apple Silicon

Instant setup, no guesswork

Pick a use case, choose the model, and drop into your code. Mirai handles quantization, decoding strategies, and structured output

Get Started for Free

Built specifically for Apple hardware. Optimized to the limit

Built from the ground up to outperform existing solutions like llama.cpp and MLX

Metric

16 Pro Max (A18 Pro)

M1 Ultra

M4 Max

Metric

16 Pro Max (A18 Pro)

M1 Ultra

M4 Max

Time to first token, s

0.303

0.066

0.188

0.041

Time to first token, s

0.303

0.066

0.188

0.041

Tokens per sec, t/s

20.598

197.093

35.572

172.276

Tokens per sec, t/s

20.598

197.093

35.572

172.276

* Llama-3.2-1B-Instruct, float16 precision, 37 input tokens

MacBook Pro (M1, 2020)

16 Pro Max (A18 Pro)

16.7

0.303

Tokens / sec

Time to first token, s

480ms

20.598

Time to 1st token

Tokens per sec, t/s

16 Pro Max (A18 Pro)

M1 Ultra

0.303

0.066

Time to first token, s

20.598

197.0

Tokens per sec, t/s

M1 Ultra

0.066

0.188

Time to first token, s

197.0

35.572

Tokens per sec, t/s

M4 Max

0.188

0.041

Time to first token, s

35.572

172.276

Tokens per sec, t/s

* Llama-3.2-1B-Instruct, float16 precison, 37 input tokens

Fastest inference on Apple devices

Up to 3x faster than existing solutions, depending on use case

Structured output

COMING SOON

SOON

Schema-aligned JSON results out of the box. Perfect for workflows that demand reliability

Low latency. Energy-effective

Blazing-fast time-to-1st-token with low-level, hardware-aware optimizations

Speculative decoding built-in

Accelerated generation with speculative decoding. Built-in, not bolted on.

Up to 3x speed improvement, depending on use case

Choose from powerful on device use cases

Integrate in minutes

General Chat

Conversational AI, running on-device

Classification

Tag text by topic, intent, or sentiment

Summarisation

Quickly turn long text into easy-to-read summary

Turn long text into easy-to-read summary

Quickly turn long text into easy-to-read summary

Custom

Build your own use case

Camera

COMING SOON

Soon

Process images with local models

COMING SOON

Voice

Soon

COMING SOON

Turn voice into actions or text

COMING SOON

No upfront costs
10K devices for free

First 10,000 used devices for free

Get Started for Free

Talk to Engineer

Setup in less than 10 minutes

Full control and performance

Run most popular small LLMs

Setup in less than 10 minutes

Full control and performance

Run most popular small LLMs

Perfect fit for

Messaging & assistants

Productivity & writing tools

Finance, health & compliance apps

On device intelligence layers

Reduce AI costs up to 40%

With Mirai SDK, creating and deploying on device AI is as easy as generating API key and choosing use case

Get Started for Free

Talk to Engineer

Main

Apple Inference SDK

Smart Routing

Models library

Blog

Docs

Pricing

Company

About us

Careers

Links

X (Twitter)

Github

Discord

Main

Apple Inference SDK

Smart Routing

Models library

Blog

Docs

Pricing

Company

About us

Careers

Links

X (Twitter)

Github

Discord

Main

Apple Inference SDK

Smart Routing

Models library

Blog

Docs

Pricing

Company

About us

Careers

Links

X (Twitter)

Github

Discord

Main

Apple Inference SDK

Smart Routing

Models library

Blog

Docs

Pricing

Company

About us

Careers

Links

X (Twitter)

Github

Discord

Run high-performance inference locally

Run high-performance inference locally

Run high-performance inference locally

Instant setup, no guesswork