Qwen3-8B-MLX-8bit from Alibaba – Run On-Device with Mirai.

Qwen3-8B-MLX-8bit

Run locally Apple devices with Mirai

Run on device

Type

Local

From

Alibaba

Quantisation

uint8

Precision

No

Size

8B

Source

Qwen3-8B-MLX-8bit is an 8.2 billion parameter causal language model that represents the latest generation in the Qwen series. This model offers a unique feature of seamlessly switching between thinking mode for complex logical reasoning, mathematics, and coding tasks, and non-thinking mode for efficient general-purpose dialogue, all within a single model. It delivers significant enhancements in reasoning capabilities, human preference alignment for creative writing and multi-turn conversations, and expertise in agent capabilities with tool integration. The model supports over 100 languages and dialects with strong multilingual instruction-following and translation capabilities. It natively supports context lengths of up to 32,768 tokens and can extend to 131,072 tokens using YaRN scaling techniques. This MLX-optimized 8-bit quantized version is designed for efficient inference while maintaining the comprehensive capabilities of the full Qwen3 model family.

Explore all local models

Qwen3-8B-MLX-8bit

Run locally Apple devices with Mirai

Run on device

Type

Local

From

Alibaba

Quantisation

uint8

Precision

float16

Size

8B

Source

Qwen3-8B-MLX-8bit is an 8.2 billion parameter causal language model that represents the latest generation in the Qwen series. This model offers a unique feature of seamlessly switching between thinking mode for complex logical reasoning, mathematics, and coding tasks, and non-thinking mode for efficient general-purpose dialogue, all within a single model. It delivers significant enhancements in reasoning capabilities, human preference alignment for creative writing and multi-turn conversations, and expertise in agent capabilities with tool integration. The model supports over 100 languages and dialects with strong multilingual instruction-following and translation capabilities. It natively supports context lengths of up to 32,768 tokens and can extend to 131,072 tokens using YaRN scaling techniques. This MLX-optimized 8-bit quantized version is designed for efficient inference while maintaining the comprehensive capabilities of the full Qwen3 model family.

Explore all local models

Main

Company

Links

Platform / SDK

Models Library

MacOS App

Blog

Docs

About us

Careers

Contact Us

Privacy Policy

Terms of Use

X (Twitter)

Github

Discord

Platform / SDK

Models Library

MacOS App

Blog

Docs

Main

About us

Careers

Contact Us

Privacy Policy

Terms of Use

Company

X (Twitter)

Github

Discord

Links

Main

Company

Links

Platform / SDK

Models Library

MacOS App

Blog

Docs

About us

Careers

Contact Us

Privacy Policy

Terms of Use

X (Twitter)

Github

Discord