Qwen2.5-Coder is the latest series of code-specific large language models from Alibaba Cloud, available in six model sizes ranging from 0.5 to 32 billion parameters. This instruction-tuned 0.5B variant brings significant improvements in code generation, code reasoning, and code fixing, trained on 5.5 trillion tokens including source code, text-code grounding, and synthetic data. The model maintains strong capabilities in mathematics and general competencies while serving as a comprehensive foundation for real-world applications such as code agents. The 0.5B model is a causal language model with 24 transformer layers, 14 query attention heads and 2 key-value heads using grouped query attention, and a full context length of 32,768 tokens. Built on the strong Qwen2.5 base, it incorporates architectural improvements including RoPE positional embeddings, SwiGLU activations, and RMSNorm normalization to deliver efficient coding capabilities in a lightweight package.
Qwen2.5-Coder is the latest series of code-specific large language models from Alibaba Cloud, available in six model sizes ranging from 0.5 to 32 billion parameters. This instruction-tuned 0.5B variant brings significant improvements in code generation, code reasoning, and code fixing, trained on 5.5 trillion tokens including source code, text-code grounding, and synthetic data. The model maintains strong capabilities in mathematics and general competencies while serving as a comprehensive foundation for real-world applications such as code agents. The 0.5B model is a causal language model with 24 transformer layers, 14 query attention heads and 2 key-value heads using grouped query attention, and a full context length of 32,768 tokens. Built on the strong Qwen2.5 base, it incorporates architectural improvements including RoPE positional embeddings, SwiGLU activations, and RMSNorm normalization to deliver efficient coding capabilities in a lightweight package.