Mirai iOS SDK Public Preview

By

Mirai team

Apr 3, 2025

Today, we’re excited to share the first public preview of our inference engine for iOS devices. It fully leverages the potential of the hardware, as described in this blog post, to achieve superior performance in specific use cases.

Here is side-by-side comparison between Mirai inference engine and MLX (Apple framework):

Chat, iPhone 16 Pro:

Mirai inference engine vs MLX (Apple Framework)

Summarization, iPhone 16 Pro: 

Mirai inference engine vs MLX (Apple Framework)

As part of the preview, you can run Llama-3.2-1b-Instruct-float16 on your device and choose one of the following configurations:

  • Chat

  • Summarization

  • Classification

In upcoming releases, we’ll add support for additional models, including VLMs, and provide specific configurations for more use cases, such as structured output.

If you have any questions, feel free to drop us a message at contact@getmirai.co.

Today, we’re excited to share the first public preview of our inference engine for iOS devices. It fully leverages the potential of the hardware, as described in this blog post, to achieve superior performance in specific use cases.

Here is side-by-side comparison between Mirai inference engine and MLX (Apple framework):

Chat, iPhone 16 Pro:

Mirai inference engine vs MLX (Apple Framework)

Summarization, iPhone 16 Pro: 

Mirai inference engine vs MLX (Apple Framework)

As part of the preview, you can run Llama-3.2-1b-Instruct-float16 on your device and choose one of the following configurations:

  • Chat

  • Summarization

  • Classification

In upcoming releases, we’ll add support for additional models, including VLMs, and provide specific configurations for more use cases, such as structured output.

If you have any questions, feel free to drop us a message at contact@getmirai.co.

Next articles:

Other articles to read:

Want to try blazing-fast AI
fully on-device?

LLMs
LLMs
Voice
Voice
Vision
Vision
No cloud required
No cloud

Deploy high-performance AI directly in your app — with zero latency, full data privacy, and no inference costs

Deploy high-performance AI directly in your app — with zero latency, full data privacy, and no inference costs