mannix/smallthinker

mannix/

smallthinker

91 Downloads Updated 11 months ago

A new small reasoning model fine-tuned from the Qwen 2.5 3B Instruct model. I-Quants models.

Models

Name

14 models

Size

Context

Input

smallthinker:latest

2.0GB · 32K context window · Text · 11 months ago

smallthinker:latest

2.0GB

32K

Text

smallthinker:q2_k

1.4GB · 32K context window · Text · 11 months ago

smallthinker:q2_k

1.4GB

32K

Text

smallthinker:q3_k_l

1.8GB · 32K context window · Text · 11 months ago

smallthinker:q3_k_l

1.8GB

32K

Text

smallthinker:q4_0

2.0GB · 32K context window · Text · 11 months ago

smallthinker:q4_0

2.0GB

32K

Text

smallthinker:q4_1

2.2GB · 32K context window · Text · 11 months ago

smallthinker:q4_1

2.2GB

32K

Text

smallthinker:q4_k_s

2.0GB · 32K context window · Text · 11 months ago

smallthinker:q4_k_s

2.0GB

32K

Text

smallthinker:q5_0

2.4GB · 32K context window · Text · 11 months ago

smallthinker:q5_0

2.4GB

32K

Text

smallthinker:q5_1

2.6GB · 32K context window · Text · 11 months ago

smallthinker:q5_1

2.6GB

32K

Text

smallthinker:q5_k_s

2.4GB · 32K context window · Text · 11 months ago

smallthinker:q5_k_s

2.4GB

32K

Text

smallthinker:q6_K

2.8GB · 32K context window · Text · 11 months ago

smallthinker:q6_K

2.8GB

32K

Text

smallthinker:q8_0

3.6GB · 32K context window · Text · 11 months ago

smallthinker:q8_0

3.6GB

32K

Text

smallthinker:iq3_xxs

1.4GB · 32K context window · Text · 11 months ago

smallthinker:iq3_xxs

1.4GB

32K

Text

smallthinker:iq3_s

1.6GB · 32K context window · Text · 11 months ago

smallthinker:iq3_s

1.6GB

32K

Text

smallthinker:iq4_xs

1.9GB · 32K context window · Text · 11 months ago

smallthinker:iq4_xs

1.9GB

32K

Text

Readme

A new model fine-tuned from the Qwen2.5-3b-Instruct model.

Quantization from fp32
Using i-matrix calibration_datav3.txt

SmallThinker is designed for the following use cases:

Edge Deployment: Its small size makes it ideal for deployment on resource-constrained devices.
Draft Model for QwQ-32B-Preview: SmallThinker can serve as a fast and efficient draft model for the larger QwQ-32B-Preview model, yielding a 70% speedup.

For achieving reasoning capabilities, it’s crucial to generate long chains of COT reasoning. Therefore, based on QWQ-32B-Preview, the authors used various synthetic techniques(such as personahub) to create the QWQ-LONGCOT-500K dataset. Compared to other similar datasets, over 75% of the author’s samples have output tokens exceeding 8K. To encourage research in the open-source community, the dataset was also made publicly available.

References