Qwen3-4B-MLX-4bit

Run locally Apple devices with Mirai

Type

Type

Local

From

From

Alibaba

Quantisation

Quantisation

uint4

Precision

Precision

No

Size

Size

4B

Source

Source

Hugging Face Logo

Qwen3-4B-MLX-4bit is the MLX-quantized version of Qwen3-4B, the latest generation of large language models from the Qwen series. It is a 4 billion parameter causal language model that supports seamless switching between thinking mode for complex logical reasoning, mathematics, and coding tasks, and non-thinking mode for efficient general-purpose dialogue. The model excels in reasoning capabilities, human preference alignment for creative writing and role-playing, agent capabilities with tool integration, and supports over 100 languages with strong multilingual instruction following and translation abilities. It natively supports context lengths of up to 32,768 tokens and can be extended to 131,072 tokens using YaRN scaling techniques.

Qwen3-4B-MLX-4bit is the MLX-quantized version of Qwen3-4B, the latest generation of large language models from the Qwen series. It is a 4 billion parameter causal language model that supports seamless switching between thinking mode for complex logical reasoning, mathematics, and coding tasks, and non-thinking mode for efficient general-purpose dialogue. The model excels in reasoning capabilities, human preference alignment for creative writing and role-playing, agent capabilities with tool integration, and supports over 100 languages with strong multilingual instruction following and translation abilities. It natively supports context lengths of up to 32,768 tokens and can be extended to 131,072 tokens using YaRN scaling techniques.

Qwen3-4B-MLX-4bit

Run locally Apple devices with Mirai

Type

Local

From

Alibaba

Quantisation

uint4

Precision

float16

Size

4B

Source

Hugging Face Logo

Qwen3-4B-MLX-4bit is the MLX-quantized version of Qwen3-4B, the latest generation of large language models from the Qwen series. It is a 4 billion parameter causal language model that supports seamless switching between thinking mode for complex logical reasoning, mathematics, and coding tasks, and non-thinking mode for efficient general-purpose dialogue. The model excels in reasoning capabilities, human preference alignment for creative writing and role-playing, agent capabilities with tool integration, and supports over 100 languages with strong multilingual instruction following and translation abilities. It natively supports context lengths of up to 32,768 tokens and can be extended to 131,072 tokens using YaRN scaling techniques.