220 6 months ago

A regular model convert into Reasoning/Think Model fine-tuned using DeepSeek GRPO algorithm without using distilled data from R1.

tools
41604d919ec8 · 32B
{
"min_p": 0.1,
"temperature": 1.5
}