mannix/

gemma2-9b-sppo-iter3

1,006 Downloads Updated 1 year ago

This model was developed using Self-Play Preference Optimization at iteration 3, based on the google/gemma-2-9b-it architecture as starting point.

Models

Name

24 models

Size

Context

Input

gemma2-9b-sppo-iter3:latest

5.8GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:latest

5.8GB

8K

Text

gemma2-9b-sppo-iter3:q2_k

3.8GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q2_k

3.8GB

8K

Text

gemma2-9b-sppo-iter3:q3_k_s

4.3GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q3_k_s

4.3GB

8K

Text

gemma2-9b-sppo-iter3:q3_k_m

4.8GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q3_k_m

4.8GB

8K

Text

gemma2-9b-sppo-iter3:q3_k_l

5.1GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q3_k_l

5.1GB

8K

Text

gemma2-9b-sppo-iter3:q4_0

5.5GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q4_0

5.5GB

8K

Text

gemma2-9b-sppo-iter3:q4_1

6.0GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q4_1

6.0GB

8K

Text

gemma2-9b-sppo-iter3:q4_k_s

5.5GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q4_k_s

5.5GB

8K

Text

gemma2-9b-sppo-iter3:q4_k_m

5.8GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q4_k_m

5.8GB

8K

Text

gemma2-9b-sppo-iter3:q5_0

6.5GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q5_0

6.5GB

8K

Text

gemma2-9b-sppo-iter3:q5_1

7.0GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q5_1

7.0GB

8K

Text

gemma2-9b-sppo-iter3:q5_k_s

6.5GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q5_k_s

6.5GB

8K

Text

gemma2-9b-sppo-iter3:q5_k_m

6.6GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q5_k_m

6.6GB

8K

Text

gemma2-9b-sppo-iter3:q6_k

7.6GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q6_k

7.6GB

8K

Text

gemma2-9b-sppo-iter3:q8_0

9.8GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q8_0

9.8GB

8K

Text

gemma2-9b-sppo-iter3:iq2_xxs

2.8GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:iq2_xxs

2.8GB

8K

Text

gemma2-9b-sppo-iter3:iq2_xs

3.1GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:iq2_xs

3.1GB

8K

Text

gemma2-9b-sppo-iter3:iq2_s

3.2GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:iq2_s

3.2GB

8K

Text

gemma2-9b-sppo-iter3:iq3_xxs

3.8GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:iq3_xxs

3.8GB

8K

Text

gemma2-9b-sppo-iter3:iq3_xs

4.1GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:iq3_xs

4.1GB

8K

Text

gemma2-9b-sppo-iter3:iq3_s

4.3GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:iq3_s

4.3GB

8K

Text

gemma2-9b-sppo-iter3:iq4_xs

5.2GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:iq4_xs

5.2GB

8K

Text

gemma2-9b-sppo-iter3:iq4_nl

5.4GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:iq4_nl

5.4GB

8K

Text

gemma2-9b-sppo-iter3:fp16

18GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:fp16

18GB

8K

Text

Readme

Quantizations with i-matrix calibration_datav3.txt
Safetensors converted to fp32

Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675)

Gemma-2-9B-It-SPPO-Iter3

This model was developed using Self-Play Preference Optimization at iteration 3, based on the google/gemma-2-9b-it architecture as starting point. We utilized the prompt sets from the openbmb/UltraFeedback dataset, splited to 3 parts for 3 iterations by snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset. All responses used are synthetic.

Links to Other Models

Model Description

Model type: A 8B parameter GPT-like model fine-tuned on synthetic datasets.
Language(s) (NLP): Primarily English
License: Apache-2.0
Finetuned from model: google/gemma-2-9b-it

AlpacaEval Leaderboard Evaluation Results

Model	LC. Win Rate	Win Rate	Avg. Length
Gemma-2-9B-SPPO Iter1	48.70	40.76	1669
Gemma-2-9B-SPPO Iter2	50.93	44.64	1759
Gemma-2-9B-SPPO Iter3	53.27	47.74	1803

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
eta: 1000
per_device_train_batch_size: 8
gradient_accumulation_steps: 1
seed: 42
distributed_type: deepspeed_zero3
num_devices: 8
optimizer: RMSProp
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_train_epochs: 1.0

Citation

@misc{wu2024self,
      title={Self-Play Preference Optimization for Language Model Alignment}, 
      author={Wu, Yue and Sun, Zhiqing and Yuan, Huizhuo and Ji, Kaixuan and Yang, Yiming and Gu, Quanquan},
      year={2024},
      eprint={2405.00675},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

- Quantizations with i-matrix `calibration_datav3.txt`
- Safetensors converted to fp32

Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675)

# Gemma-2-9B-It-SPPO-Iter3

This model was developed using [Self-Play Preference Optimization](https://arxiv.org/abs/2405.00675) at iteration 3, based on the [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.

## Links to Other Models
- [Gemma-2-9B-It-SPPO-Iter1](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter1)
- [Gemma-2-9B-It-SPPO-Iter2](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2)
- [Gemma-2-9B-It-SPPO-Iter3](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3)

### Model Description

- Model type: A 8B parameter GPT-like model fine-tuned on synthetic datasets.
- Language(s) (NLP): Primarily English
- License: Apache-2.0
- Finetuned from model: google/gemma-2-9b-it

## [AlpacaEval Leaderboard Evaluation Results](https://tatsu-lab.github.io/alpaca_eval/)

|                Model                           | LC. Win Rate | Win Rate | Avg. Length |
|-------------------------------------------|:------------:|:--------:|:-----------:|
|[Gemma-2-9B-SPPO Iter1](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter1) |48.70 |40.76 | 1669
|[Gemma-2-9B-SPPO Iter2](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2) |50.93 | 44.64 | 1759
|[Gemma-2-9B-SPPO Iter3](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3) |**53.27** |**47.74** | 1803

### Training hyperparameters
The following hyperparameters were used during training:

- learning_rate: 5e-07
- eta: 1000
- per_device_train_batch_size: 8
- gradient_accumulation_steps: 1
- seed: 42
- distributed_type: deepspeed_zero3
- num_devices: 8
- optimizer: RMSProp 
- lr_scheduler_type: linear 
- lr_scheduler_warmup_ratio: 0.1
- num_train_epochs: 1.0

## Citation
```
@misc{wu2024self,
      title={Self-Play Preference Optimization for Language Model Alignment}, 
      author={Wu, Yue and Sun, Zhiqing and Yuan, Huizhuo and Ji, Kaixuan and Yang, Yiming and Gu, Quanquan},
      year={2024},
      eprint={2405.00675},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
```

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)