1,006 1 year ago

This model was developed using Self-Play Preference Optimization at iteration 3, based on the google/gemma-2-9b-it architecture as starting point.

Models

View all →

24 models

gemma2-9b-sppo-iter3:latest

5.8GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q2_k

3.8GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q3_k_s

4.3GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q3_k_m

4.8GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q3_k_l

5.1GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q4_0

5.5GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q4_1

6.0GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q4_k_s

5.5GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q4_k_m

5.8GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q5_0

6.5GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q5_1

7.0GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q5_k_s

6.5GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q5_k_m

6.6GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q6_k

7.6GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:q8_0

9.8GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:iq2_xxs

2.8GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:iq2_xs

3.1GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:iq2_s

3.2GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:iq3_xxs

3.8GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:iq3_xs

4.1GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:iq3_s

4.3GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:iq4_xs

5.2GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:iq4_nl

5.4GB · 8K context window · Text · 1 year ago

gemma2-9b-sppo-iter3:fp16

18GB · 8K context window · Text · 1 year ago

Readme

  • Quantizations with i-matrix calibration_datav3.txt
  • Safetensors converted to fp32

Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675)

Gemma-2-9B-It-SPPO-Iter3

This model was developed using Self-Play Preference Optimization at iteration 3, based on the google/gemma-2-9b-it architecture as starting point. We utilized the prompt sets from the openbmb/UltraFeedback dataset, splited to 3 parts for 3 iterations by snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset. All responses used are synthetic.

Links to Other Models

Model Description

  • Model type: A 8B parameter GPT-like model fine-tuned on synthetic datasets.
  • Language(s) (NLP): Primarily English
  • License: Apache-2.0
  • Finetuned from model: google/gemma-2-9b-it

AlpacaEval Leaderboard Evaluation Results

Model LC. Win Rate Win Rate Avg. Length
Gemma-2-9B-SPPO Iter1 48.70 40.76 1669
Gemma-2-9B-SPPO Iter2 50.93 44.64 1759
Gemma-2-9B-SPPO Iter3 53.27 47.74 1803

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • eta: 1000
  • per_device_train_batch_size: 8
  • gradient_accumulation_steps: 1
  • seed: 42
  • distributed_type: deepspeed_zero3
  • num_devices: 8
  • optimizer: RMSProp
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_train_epochs: 1.0

Citation

@misc{wu2024self,
      title={Self-Play Preference Optimization for Language Model Alignment}, 
      author={Wu, Yue and Sun, Zhiqing and Yuan, Huizhuo and Ji, Kaixuan and Yang, Yiming and Gu, Quanquan},
      year={2024},
      eprint={2405.00675},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}