Apple Inference SDK for iOS & Mac

Apple Inference SDK for iOS & Mac

Run high-performance inference locally

Mirai unlocks the full power of Apple’s GPU and ANE, delivering best-in-class inference speed for AI models

Run most popular small LLMs

First 10,000 devices for free

Optimised for Apple Silicon

Apple Inference SDK for iOS & Mac

Run high-performance inference locally

Mirai unlocks the full power of Apple’s GPU and ANE, delivering best-in-class inference speed for AI models

Run most popular small LLMs

First 10,000 devices for free

Optimised for Apple Silicon

Apple Inference SDK for iOS & Mac

Run high-performance inference locally

Mirai unlocks the full power of Apple’s GPU and ANE, delivering best-in-class inference speed for AI models

Run most popular small LLMs

First 10,000 devices for free

Optimised for Apple Silicon

Instant setup, no guesswork

Pick a use case, choose the model, and drop into your code. Mirai handles quantization, decoding strategies, and structured output

Built specifically for Apple hardware. Optimized to the limit

Built from the ground up to outperform existing solutions like llama.cpp and MLX

Built from the ground up to outperform existing solutions like llama.cpp and MLX

Built from the ground up to outperform existing solutions like llama.cpp and MLX

Metric

16 Pro Max (A18 Pro)

M1 Ultra

M2

M4 Max

Metric

16 Pro Max (A18 Pro)

M1 Ultra

M2

M4 Max

Time to first token, s

0.303

0.066

0.188

0.041

Time to first token, s

0.303

0.066

0.188

0.041

Tokens per sec, t/s

20.598

197.093

35.572

172.276

Tokens per sec, t/s

20.598

197.093

35.572

172.276

* Llama-3.2-1B-Instruct, float16 precison, 37 input tokens

MacBook Pro (M1, 2020)

16 Pro Max (A18 Pro)

16 Pro Max (A18 Pro)

16.7

0.303

0.303

Tokens / sec

Time to first token, s

Time to first token, s

480ms

20.598

20.598

Time to 1st token

Tokens per sec, t/s

16 Pro Max (A18 Pro)

M1 Ultra

M1 Ultra

0.303

0.066

0.066

Time to first token, s

20.598

197.0

197.0

Tokens per sec, t/s

M1 Ultra

M2

M2

0.066

0.188

0.188

Time to first token, s

197.0

35.572

35.572

Tokens per sec, t/s

M2

M4 Max

M4 Max

0.188

0.041

0.041

Time to first token, s

35.572

172.276

172.276

Tokens per sec, t/s

* Llama-3.2-1B-Instruct, float16 precison, 37 input tokens

* Llama-3.2-1B-Instruct, float16 precison, 37 input tokens

Fastest inference on Apple devices

Up to 3x faster than existing solutions, depending on use case

Structured output

Structured output

COMING SOON

SOON

Schema-aligned JSON results out of the box. Perfect for workflows that demand reliability

Low latency. Energy-effective

Low latency. Energy-effective

Blazing-fast time-to-1st-token with low-level, hardware-aware optimizations

Speculative decoding built-in

Accelerated generation with speculative decoding. Built-in, not bolted on.

Up to 3x speed improvement, depending on use case

Choose from powerful on device use cases

Choose from powerful on device use cases

Integrate in minutes. No unnecessary complexity

Integrate in minutes

Integrate in minutes

General Chat

General Chat

General Chat

Conversational AI, running on-device

Conversational AI, running on-device

Conversational AI, running on-device

Classification

Classification

Classification

Tag text by topic, intent, or sentiment

Tag text by topic, intent, or sentiment

Tag text by topic, intent, or sentiment

Summarisation

Summarisation

Summarisation

Quickly turn long text into easy-to-read summary

Turn long text into easy-to-read summary

Quickly turn long text into easy-to-read summary

Custom

Custom

Custom

Custom

Build your own use case

Build your own use case

Build your own use case

Camera

Camera

Camera

Camera

COMING SOON

Soon

Process images with local models

Process images with local models

Voice

Voice

Voice

Voice

Soon

COMING SOON

Turn voice into actions or text

Turn voice into actions or text

First 10,000 used devices for free

First 10,000 used devices for free

First 10,000 used devices for free

Setup in less than 10 minutes

Full control and performance

Run most popular small LLMs

Setup in less than 10 minutes

Full control and performance

Run most popular small LLMs

Reduce AI costs up to 40%

Perfect fit for

Perfect fit for

Messaging & assistants

Productivity & writing tools

Finance, health & compliance apps

On device intelligence layers

Set up your AI project in 10 minutes

With Mirai SDK, creating and deploying on device AI is as easy as generating API key and choosing use case