Type
Type
Local
From
From
Alibaba
Quantisation
Quantisation
uint4
Precision
Precision
No
Size
Size
0.6B
Qwen3-0.6B-MLX-4bit is a 600 million parameter language model from the latest Qwen3 series, optimized for inference on Apple Silicon using the MLX framework with 4-bit quantization. It is a causal language model with 28 layers, 16 attention heads, and supports a 32,768 token context length. The model offers a unique capability to seamlessly switch between thinking mode for complex logical reasoning, mathematics, and coding tasks, and non-thinking mode for efficient general-purpose dialogue. In thinking mode, the model generates reasoning chains wrapped in XML-style tags before producing final responses, similar to its larger sibling QwQ-32B. The model demonstrates significant improvements in reasoning capabilities, human preference alignment for creative writing and role-playing, agent capabilities for tool integration, and support for over 100 languages with strong multilingual instruction-following abilities. Qwen3-0.6B can be controlled to enable or disable thinking behavior both through hard switches during prompt formatting and soft switches using user commands like /think and /no_think within conversations. The model is designed to work with the latest versions of transformers and mlx_lm libraries and supports agentic use cases through tool-calling capabilities.
Qwen3-0.6B-MLX-4bit is a 600 million parameter language model from the latest Qwen3 series, optimized for inference on Apple Silicon using the MLX framework with 4-bit quantization. It is a causal language model with 28 layers, 16 attention heads, and supports a 32,768 token context length. The model offers a unique capability to seamlessly switch between thinking mode for complex logical reasoning, mathematics, and coding tasks, and non-thinking mode for efficient general-purpose dialogue. In thinking mode, the model generates reasoning chains wrapped in XML-style tags before producing final responses, similar to its larger sibling QwQ-32B. The model demonstrates significant improvements in reasoning capabilities, human preference alignment for creative writing and role-playing, agent capabilities for tool integration, and support for over 100 languages with strong multilingual instruction-following abilities. Qwen3-0.6B can be controlled to enable or disable thinking behavior both through hard switches during prompt formatting and soft switches using user commands like /think and /no_think within conversations. The model is designed to work with the latest versions of transformers and mlx_lm libraries and supports agentic use cases through tool-calling capabilities.
Qwen3-0.6B-MLX-4bit is a 600 million parameter language model from the latest Qwen3 series, optimized for inference on Apple Silicon using the MLX framework with 4-bit quantization. It is a causal language model with 28 layers, 16 attention heads, and supports a 32,768 token context length. The model offers a unique capability to seamlessly switch between thinking mode for complex logical reasoning, mathematics, and coding tasks, and non-thinking mode for efficient general-purpose dialogue. In thinking mode, the model generates reasoning chains wrapped in XML-style tags before producing final responses, similar to its larger sibling QwQ-32B. The model demonstrates significant improvements in reasoning capabilities, human preference alignment for creative writing and role-playing, agent capabilities for tool integration, and support for over 100 languages with strong multilingual instruction-following abilities. Qwen3-0.6B can be controlled to enable or disable thinking behavior both through hard switches during prompt formatting and soft switches using user commands like /think and /no_think within conversations. The model is designed to work with the latest versions of transformers and mlx_lm libraries and supports agentic use cases through tool-calling capabilities.