Qwen3 is the latest generation of large language models in the Qwen series, offering a comprehensive suite of dense and mixture-of-experts models built upon extensive training. Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support. A key distinguishing feature of Qwen3 is its unique ability to seamlessly switch between thinking mode for complex logical reasoning, math, and coding, and non-thinking mode for efficient general-purpose dialogue, all within a single model. The model significantly enhances reasoning capabilities, surpassing previous QwQ and Qwen2.5 instruct models on mathematics, code generation, and commonsense logical reasoning. It excels in human preference alignment for creative writing, role-playing, multi-turn dialogues, and instruction following, while also demonstrating strong expertise in agent capabilities with precise tool integration. Additionally, Qwen3 supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities. Qwen3-1.7B is a 1.7 billion parameter causal language model with 28 layers, 16 attention heads for queries and 8 for key-value, and a context length of 32,768 tokens. It has been trained through both pretraining and post-training stages and can be used with the latest Hugging Face transformers library for text generation across both thinking and non-thinking modes.
Qwen3 is the latest generation of large language models in the Qwen series, offering a comprehensive suite of dense and mixture-of-experts models built upon extensive training. Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support. A key distinguishing feature of Qwen3 is its unique ability to seamlessly switch between thinking mode for complex logical reasoning, math, and coding, and non-thinking mode for efficient general-purpose dialogue, all within a single model. The model significantly enhances reasoning capabilities, surpassing previous QwQ and Qwen2.5 instruct models on mathematics, code generation, and commonsense logical reasoning. It excels in human preference alignment for creative writing, role-playing, multi-turn dialogues, and instruction following, while also demonstrating strong expertise in agent capabilities with precise tool integration. Additionally, Qwen3 supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities. Qwen3-1.7B is a 1.7 billion parameter causal language model with 28 layers, 16 attention heads for queries and 8 for key-value, and a context length of 32,768 tokens. It has been trained through both pretraining and post-training stages and can be used with the latest Hugging Face transformers library for text generation across both thinking and non-thinking modes.