Qwen3-0.6B is a compact causal language model with 0.6 billion parameters from the latest Qwen3 series. It features unique support for seamlessly switching between thinking mode, which employs complex logical reasoning for mathematics and coding tasks, and non-thinking mode, which prioritizes efficient general-purpose dialogue. The model delivers significant improvements in reasoning capabilities, human preference alignment for creative writing and role-playing, agent capabilities for tool integration, and supports over 100 languages with strong multilingual instruction-following abilities. This MLX-8bit quantized version is optimized for efficient inference on Apple Silicon devices. The model has a context length of 32,768 tokens, 28 layers, and uses grouped query attention with 16 attention heads for queries and 8 for key-value pairs. Users can dynamically control thinking behavior through both hard switches via the enable_thinking parameter and soft switches using /think and /no_think tags in prompts for multi-turn conversations.
Alibaba
available local models on Mirai:
available local models on Mirai:
Name
Quantisation
Size
Qwen2.5-Coder-0.5B-Instruct
uint8
0.5B
Quant.
uint8
Size
0.5B
Qwen2.5-Coder-1.5B-Instruct
uint8
1.5B
Quant.
uint8
Size
1.5B
Qwen2.5-Coder-14B-Instruct
uint8
14B
Quant.
uint8
Size
14B
Qwen2.5-Coder-32B-Instruct
uint8
32B
Quant.
uint8
Size
32B
Qwen2.5-Coder-3B-Instruct
uint8
3B
Quant.
uint8
Size
3B
Qwen2.5-Coder-7B-Instruct
uint8
7B
Quant.
uint8
Size
7B
Qwen3-0.6B
uint8
0.6B
Quant.
uint8
Size
0.6B
Qwen3-0.6B-MLX-4bit
uint8
0.6B
Quant.
uint8
Size
0.6B
Qwen3-0.6B-MLX-8bit
uint8
0.6B
Quant.
uint8
Size
0.6B
Qwen3-1.7B
uint8
1.7B
Quant.
uint8
Size
1.7B
Qwen3-1.7B-MLX-4bit
uint8
1.7B
Quant.
uint8
Size
1.7B
Qwen3-1.7B-MLX-8bit
uint8
1.7B
Quant.
uint8
Size
1.7B
Qwen3-14B
uint8
14B
Quant.
uint8
Size
14B
Qwen3-14B-AWQ
uint8
14B
Quant.
uint8
Size
14B
Qwen3-14B-MLX-4bit
uint8
14B
Quant.
uint8
Size
14B
Qwen3-14B-MLX-8bit
uint8
14B
Quant.
uint8
Size
14B
Qwen3-32B
uint8
32B
Quant.
uint8
Size
32B
Qwen3-32B-AWQ
uint8
32B
Quant.
uint8
Size
32B
Qwen3-32B-MLX-4bit
uint8
32B
Quant.
uint8
Size
32B
Qwen3-4B
uint8
4B
Quant.
uint8
Size
4B
Qwen3-0.6B is a compact causal language model with 0.6 billion parameters from the latest Qwen3 series. It features unique support for seamlessly switching between thinking mode, which employs complex logical reasoning for mathematics and coding tasks, and non-thinking mode, which prioritizes efficient general-purpose dialogue. The model delivers significant improvements in reasoning capabilities, human preference alignment for creative writing and role-playing, agent capabilities for tool integration, and supports over 100 languages with strong multilingual instruction-following abilities. This MLX-8bit quantized version is optimized for efficient inference on Apple Silicon devices. The model has a context length of 32,768 tokens, 28 layers, and uses grouped query attention with 16 attention heads for queries and 8 for key-value pairs. Users can dynamically control thinking behavior through both hard switches via the enable_thinking parameter and soft switches using /think and /no_think tags in prompts for multi-turn conversations.
Alibaba
available local models on Mirai:
Name
Quantisation
Size
Qwen2.5-Coder-0.5B-Instruct
uint8
0.5B
Quant.
uint8
Size
0.5B
Qwen2.5-Coder-1.5B-Instruct
uint8
1.5B
Quant.
uint8
Size
1.5B
Qwen2.5-Coder-14B-Instruct
uint8
14B
Quant.
uint8
Size
14B
Qwen2.5-Coder-32B-Instruct
uint8
32B
Quant.
uint8
Size
32B
Qwen2.5-Coder-3B-Instruct
uint8
3B
Quant.
uint8
Size
3B
Qwen2.5-Coder-7B-Instruct
uint8
7B
Quant.
uint8
Size
7B
Qwen3-0.6B
uint8
0.6B
Quant.
uint8
Size
0.6B
Qwen3-0.6B-MLX-4bit
uint8
0.6B
Quant.
uint8
Size
0.6B
Qwen3-0.6B-MLX-8bit
uint8
0.6B
Quant.
uint8
Size
0.6B
Qwen3-1.7B
uint8
1.7B
Quant.
uint8
Size
1.7B
Qwen3-1.7B-MLX-4bit
uint8
1.7B
Quant.
uint8
Size
1.7B
Qwen3-1.7B-MLX-8bit
uint8
1.7B
Quant.
uint8
Size
1.7B
Qwen3-14B
uint8
14B
Quant.
uint8
Size
14B
Qwen3-14B-AWQ
uint8
14B
Quant.
uint8
Size
14B
Qwen3-14B-MLX-4bit
uint8
14B
Quant.
uint8
Size
14B
Qwen3-14B-MLX-8bit
uint8
14B
Quant.
uint8
Size
14B
Qwen3-32B
uint8
32B
Quant.
uint8
Size
32B
Qwen3-32B-AWQ
uint8
32B
Quant.
uint8
Size
32B
Qwen3-32B-MLX-4bit
uint8
32B
Quant.
uint8
Size
32B
Qwen3-4B
uint8
4B
Quant.
uint8
Size
4B