1,550 Downloads Updated 7 months ago
This is a 32B reasoning model trained from Qwen2.5-32B-Instruct with 17K data. The performance is on par with o1-preview model on both math and coding. Please see our blog post for more details.
Sky-T1-32B-Preview | Qwen-2.5-32B-Instruct | QwQ | o1-preview | |
---|---|---|---|---|
Math500 | 82.4 | 76.2 | 85.4 | 81.4 |
AIME2024 | 43.3 | 16.7 | 50.0 | 40.0 |
LiveCodeBench-Easy | 86.3 | 84.6 | 90.7 | 92.9 |
LiveCodeBench-Medium | 56.8 | 40.8 | 56.3 | 54.9 |
LiveCodeBench-Hard | 17.9 | 9.8 | 17.1 | 16.3 |
GPQA-Diamond | 56.8 | 45.5 | 52.5 | 75.2 |
We would like to thanks the compute resources from Lambda Lab and AnyScale. We would like to thanks the academic feedback and support from the Still-2 Team, and Junyang Lin from the Qwen Team.