Qwen3-0.6B-MLX-8bit from Alibaba – Run On-Device with Mirai.

Qwen3-0.6B-MLX-8bit

Run locally Apple devices with Mirai

Run on device

Type

Local

From

Alibaba

Quantisation

uint8

Precision

No

Size

0.6B

Source

Explore all local models

Qwen3-0.6B is a compact causal language model with 0.6 billion parameters from the latest Qwen3 series. It features unique support for seamlessly switching between thinking mode, which employs complex logical reasoning for mathematics and coding tasks, and non-thinking mode, which prioritizes efficient general-purpose dialogue. The model delivers significant improvements in reasoning capabilities, human preference alignment for creative writing and role-playing, agent capabilities for tool integration, and supports over 100 languages with strong multilingual instruction-following abilities. This MLX-8bit quantized version is optimized for efficient inference on Apple Silicon devices. The model has a context length of 32,768 tokens, 28 layers, and uses grouped query attention with 16 attention heads for queries and 8 for key-value pairs. Users can dynamically control thinking behavior through both hard switches via the enable_thinking parameter and soft switches using /think and /no_think tags in prompts for multi-turn conversations.

1

Choose framework

2

Run the following command to install Mirai SDK

SPMhttps://github.com/trymirai/uzu-swift

3

Set Mirai API keyGet API Key →

4

Apply code

Loading...

Qwen3-0.6B-MLX-8bit

Run locally Apple devices with Mirai

Run on device

Type

Local

From

Alibaba

Quantisation

uint8

Precision

float16

Size

0.6B

Source

Explore all local models

Qwen3-0.6B is a compact causal language model with 0.6 billion parameters from the latest Qwen3 series. It features unique support for seamlessly switching between thinking mode, which employs complex logical reasoning for mathematics and coding tasks, and non-thinking mode, which prioritizes efficient general-purpose dialogue. The model delivers significant improvements in reasoning capabilities, human preference alignment for creative writing and role-playing, agent capabilities for tool integration, and supports over 100 languages with strong multilingual instruction-following abilities. This MLX-8bit quantized version is optimized for efficient inference on Apple Silicon devices. The model has a context length of 32,768 tokens, 28 layers, and uses grouped query attention with 16 attention heads for queries and 8 for key-value pairs. Users can dynamically control thinking behavior through both hard switches via the enable_thinking parameter and soft switches using /think and /no_think tags in prompts for multi-turn conversations.

1

Choose framework

2

Run the following command to install Mirai SDK

SPMhttps://github.com/trymirai/uzu-swift

3

Set Mirai API keyGet API Key →

4

Apply code

Loading...

Main

Company

Links

Platform / SDK

Models Library

MacOS App

Blog

Docs

About us

Careers

Contact Us

Privacy Policy

Terms of Use

X (Twitter)

Github

Discord

Platform / SDK

Models Library

MacOS App

Blog

Docs

Main

About us

Careers

Contact Us

Privacy Policy

Terms of Use

Company

X (Twitter)

Github

Discord

Links

Main

Company

Links

Platform / SDK

Models Library

MacOS App

Blog

Docs

About us

Careers

Contact Us

Privacy Policy

Terms of Use

X (Twitter)

Github

Discord