mychen76/qwen2.5-3b-think-r1/params

mychen76/ qwen2.5-3b-think-r1:latest

268 Downloads Updated 1 year ago

A regular model convert into Reasoning/Think Model fine-tuned using DeepSeek GRPO algorithm without using distilled data from R1.

tools

qwen2.5-3b-think-r1:latest ... /

params

41604d919ec8 · 32B

{

"min_p": 0.1,

"temperature": 1.5

}