Polaris-4B-Preview

Run locally Apple devices with Mirai

Type

Type

Local

From

From

POLARIS-Project

Quantisation

Quantisation

No

Precision

Precision

No

Size

Size

4B

Source

Source

Hugging Face Logo

Polaris is an open-source post-training method that uses reinforcement learning to enhance models with advanced reasoning abilities. The approach demonstrates that even smaller models like Qwen3-4B can achieve significant improvements on challenging reasoning tasks through RL optimization, with results surpassing top commercial systems like Claude-4-Opus, Grok-3-Beta, and o3-mini-high on benchmark evaluations. The method incorporates several key techniques including data difficulty analysis and distribution mapping, diversity-based rollout sampling with progressive temperature increases, inference-time length extrapolation for generating longer chain-of-thought reasoning while training with shorter sequences, and multi-stage training to improve exploration efficiency. Polaris leverages open-source data and academic-level resources to push the capabilities of open-recipe reasoning models to new heights.

1
Choose framework
2
Run the following command to install Mirai SDK
SPMhttps://github.com/trymirai/uzu-swift
3
Set Mirai API keyGet API Key
4
Apply code
Loading...

Polaris-4B-Preview

Run locally Apple devices with Mirai

Type

Local

From

POLARIS-Project

Quantisation

No

Precision

float16

Size

4B

Source

Hugging Face Logo

Polaris is an open-source post-training method that uses reinforcement learning to enhance models with advanced reasoning abilities. The approach demonstrates that even smaller models like Qwen3-4B can achieve significant improvements on challenging reasoning tasks through RL optimization, with results surpassing top commercial systems like Claude-4-Opus, Grok-3-Beta, and o3-mini-high on benchmark evaluations. The method incorporates several key techniques including data difficulty analysis and distribution mapping, diversity-based rollout sampling with progressive temperature increases, inference-time length extrapolation for generating longer chain-of-thought reasoning while training with shorter sequences, and multi-stage training to improve exploration efficiency. Polaris leverages open-source data and academic-level resources to push the capabilities of open-recipe reasoning models to new heights.

1
Choose framework
2
Run the following command to install Mirai SDK
SPMhttps://github.com/trymirai/uzu-swift
3
Set Mirai API keyGet API Key
4
Apply code
Loading...