218 6 months ago

A regular model convert into Reasoning/Think Model fine-tuned using DeepSeek GRPO algorithm without using distilled data from R1.

tools

Models

View all →

Readme

Experimental “cot/reasoning/think” model finetuned from regular Qwen2.5-3b LLM model into a full-fledged DeepSeek-R1 like reasoning model using Group Relative Policy Optimization (GRPO ). it’s Renforce Learning (RL) algorithm that optimizes responses efficiently without requiring a value function use by DeepSeek-R1 Team.

This fine-tuned process takes a standard model then convert into a full-fledged reasoning model using GRPO. This is not fine-tuning DeepSeek’s R1 distilled models or using distilled data from R1 for tuning.

This model learned to extend its thinking time by reevaluating its initial approach, without any human guidance or predefined instructions.

++++++++++++++++++++++++++++++++++

BEFORE

++++++++++++++++++++++++++++++++++

❯ ollama run qwen2.5:3b
>>> Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did s
... he earn?
To calculate Weng's earnings for the 50 minutes of babysitting, we need to follow these steps:

1. Determine how many hours 50 minutes is equivalent to.
2. Calculate her earnings based on that number of hours.

First, let's convert 50 minutes into hours:
\[ \text{Hours} = \frac{\text{Minutes}}{60} = \frac{50}{60} = \frac{5}{6} \text{ hours} \]

Next, calculate her earnings for that duration. Given that she earns $12 per hour:
\[ \text{Earnings} = \left(\frac{5}{6}\right) \times 12 = 5 \text{ dollars} \]

So, Weng earned **$5** from babysitting for 50 minutes.

++++++++++++++++++++++++++++++++++

AFTER

++++++++++++++++++++++++++++++++++

❯ ollama run mychen76/qwen2.5-3b-think-r1
>>> Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did s
... he earn?

<thinking>
To find out how much Weng earned from her babysitting yesterday, we need to calculate the earnings 
based on the rate and the time worked.
Firstly, convert the working time (50 minutes) into hours because the wage is given per hour. There are 
60 minutes in an hour, so:
\[ \text{Time in Hours} = \frac{50}{6mlt} = 8.333... \text{hours} \]
Since Weng earns $12 for every hour she babysits, her earnings can be calculated as follows:
\[ \text{Earnings} = 12 \times 0.833\ldots = $10 \text{ (approximately)} \]
</thinking>

<answer>$10</answer>