Qwen3-4B-Thinking from Alibaba – Run On-Device with Mirai.

Qwen3-4B-Thinking

Run locally Apple devices with Mirai

Run on device

Type

Local

From

Alibaba

Quantisation

No

Precision

No

Size

4B

Source

Qwen3-4B-Thinking-2507 is a 4 billion parameter causal language model featuring advanced reasoning capabilities built through three months of scaling improvements. The model specializes in complex reasoning tasks including logical reasoning, mathematics, science, coding, and academic problems that typically require human expertise, while also maintaining strong general capabilities in instruction following, tool usage, text generation, and human preference alignment. The model supports a native context length of 262,144 tokens and operates exclusively in thinking mode, automatically generating internal reasoning chains before producing final outputs. It has been enhanced with deeper and higher quality reasoning compared to its predecessor, alongside improved performance on long-context understanding tasks. The model is designed to handle highly complex reasoning problems and benefits from increased output length allocations, with recommendations to use 32,768 tokens for standard queries and up to 81,920 tokens for challenging mathematical and coding problems.

Explore all local models

Qwen3-4B-Thinking

Run locally Apple devices with Mirai

Run on device

Type

Local

From

Alibaba

Quantisation

No

Precision

float16

Size

4B

Source

Qwen3-4B-Thinking-2507 is a 4 billion parameter causal language model featuring advanced reasoning capabilities built through three months of scaling improvements. The model specializes in complex reasoning tasks including logical reasoning, mathematics, science, coding, and academic problems that typically require human expertise, while also maintaining strong general capabilities in instruction following, tool usage, text generation, and human preference alignment. The model supports a native context length of 262,144 tokens and operates exclusively in thinking mode, automatically generating internal reasoning chains before producing final outputs. It has been enhanced with deeper and higher quality reasoning compared to its predecessor, alongside improved performance on long-context understanding tasks. The model is designed to handle highly complex reasoning problems and benefits from increased output length allocations, with recommendations to use 32,768 tokens for standard queries and up to 81,920 tokens for challenging mathematical and coding problems.

Explore all local models

Main

Company

Links

Platform / SDK

Models Library

MacOS App

Blog

Docs

About us

Careers

Contact Us

Privacy Policy

Terms of Use

X (Twitter)

Github

Discord

Platform / SDK

Models Library

MacOS App

Blog

Docs

Main

About us

Careers

Contact Us

Privacy Policy

Terms of Use

Company

X (Twitter)

Github

Discord

Links

Main

Company

Links

Platform / SDK

Models Library

MacOS App

Blog

Docs

About us

Careers

Contact Us

Privacy Policy

Terms of Use

X (Twitter)

Github

Discord