219 6 months ago

A regular model convert into Reasoning/Think Model fine-tuned using DeepSeek GRPO algorithm without using distilled data from R1.

tools