220 6 months ago

A regular model convert into Reasoning/Think Model fine-tuned using DeepSeek GRPO algorithm without using distilled data from R1.

tools
886ec394c1a4 · 84B
Respond in the following format:
<thinking>
...
</thinking>
<answer>
...
</answer>