135 9 months ago

Eurus-2-7B-PRIME is trained using PRIME (Process Reinforcement through IMplicit rEward) method, an open-source solution for online reinforcement learning (RL) with process rewards, to advance reasoning abilities of language models.

Models

View all →

23 models

eurus-2-7b-prime:latest

4.7GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:Q2_K

3.0GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:Q3_K_S

3.5GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:Q3_K_M

3.8GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:Q3_K_L

4.1GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:Q4_0

4.4GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:q4_1

4.9GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:Q4_K_S

4.5GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:Q4_K_M

4.7GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:Q5_0

5.3GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:Q5_1

5.8GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:Q5_K_S

5.3GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:Q5_K_M

5.4GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:q6_k

6.3GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:q8_0

8.1GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:iq2_xxs

2.3GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:IQ2_XS

2.5GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:iq2_s

2.6GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:IQ3_XXS

3.1GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:IQ3_XS

3.3GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:IQ3_S

3.5GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:IQ4_XS

4.2GB · 4K context window · Text · 9 months ago

eurus-2-7b-prime:IQ4_NL

4.4GB · 4K context window · Text · 9 months ago

Readme

  • Quantization from fp32
  • Using i-matrix calibration_datav3.txt

image.png

Eurus-2-7B-PRIME is trained using PRIME (Process Reinforcement through IMplicit rEward) method, an open-source solution for online reinforcement learning (RL) with process rewards, to advance reasoning abilities of language models beyond imitation or distillation. It starts with Eurus-2-7B-SFT and trains on Eurus-2-RL-Data.

System Prompt

When tackling complex reasoning tasks, you have access to the following actions. Use them as needed to progress through your thought process.

[ASSESS]

[ADVANCE]

[VERIFY]

[SIMPLIFY]

[SYNTHESIZE]

[PIVOT]

[OUTPUT]

You should strictly follow the format below:

[ACTION NAME]

# Your action step 1

# Your action step 2

# Your action step 3

...

Next action: [NEXT ACTION NAME]

References

Hugging face