1,550 7 months ago

32B reasoning model trained from Qwen2.5-32B-Instruct with 17K data with performance on par with o1 preview.

tools

Models

View all →

Readme

This is a 32B reasoning model trained from Qwen2.5-32B-Instruct with 17K data. The performance is on par with o1-preview model on both math and coding. Please see our blog post for more details.

  • Developed by: NovaSky Team from Sky Computing Lab at UC Berkeley.

Evaluation

Sky-T1-32B-Preview Qwen-2.5-32B-Instruct QwQ o1-preview
Math500 82.4 76.2 85.4 81.4
AIME2024 43.3 16.7 50.0 40.0
LiveCodeBench-Easy 86.3 84.6 90.7 92.9
LiveCodeBench-Medium 56.8 40.8 56.3 54.9
LiveCodeBench-Hard 17.9 9.8 17.1 16.3
GPQA-Diamond 56.8 45.5 52.5 75.2

Acknowledgement

We would like to thanks the compute resources from Lambda Lab and AnyScale. We would like to thanks the academic feedback and support from the Still-2 Team, and Junyang Lin from the Qwen Team.

References

Huggingface

Blog Post