mychen76/qwen2.5-3b-think-r1/system

mychen76/ qwen2.5-3b-think-r1:latest

268 Downloads Updated 1 year ago

A regular model convert into Reasoning/Think Model fine-tuned using DeepSeek GRPO algorithm without using distilled data from R1.

tools

qwen2.5-3b-think-r1:latest ... /

system

886ec394c1a4 · 84B

Respond in the following format:

<thinking>

...

</thinking>

<answer>

...

</answer>