Qwen2.5-Coder is the latest series of code-specific Qwen large language models covering six mainstream model sizes from 0.5 to 32 billion parameters. This instruction-tuned 32B variant brings significant improvements in code generation, code reasoning, and code fixing, with training scaled up to 5.5 trillion tokens including source code, text-code grounding, and synthetic data. The model achieves state-of-the-art open-source code LLM performance with coding abilities matching those of GPT-4o, while also maintaining strengths in mathematics and general competencies to support real-world applications such as code agents. Qwen2.5-Coder-32B-Instruct is a causal language model with 32.5 billion parameters built on a transformer architecture with RoPE, SwiGLU, RMSNorm, and attention QKV bias. It supports long-context processing up to 128K tokens and includes 64 layers with grouped query attention using 40 heads for queries and 8 heads for key-value pairs. The model combines pretraining and post-training stages to provide a comprehensive foundation for practical coding applications.
Qwen2.5-Coder is the latest series of code-specific Qwen large language models covering six mainstream model sizes from 0.5 to 32 billion parameters. This instruction-tuned 32B variant brings significant improvements in code generation, code reasoning, and code fixing, with training scaled up to 5.5 trillion tokens including source code, text-code grounding, and synthetic data. The model achieves state-of-the-art open-source code LLM performance with coding abilities matching those of GPT-4o, while also maintaining strengths in mathematics and general competencies to support real-world applications such as code agents. Qwen2.5-Coder-32B-Instruct is a causal language model with 32.5 billion parameters built on a transformer architecture with RoPE, SwiGLU, RMSNorm, and attention QKV bias. It supports long-context processing up to 128K tokens and includes 64 layers with grouped query attention using 40 heads for queries and 8 heads for key-value pairs. The model combines pretraining and post-training stages to provide a comprehensive foundation for practical coding applications.