1,006 Downloads Updated 1 year ago
Name
24 models
gemma2-9b-sppo-iter3:latest
5.8GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:q2_k
3.8GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:q3_k_s
4.3GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:q3_k_m
4.8GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:q3_k_l
5.1GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:q4_0
5.5GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:q4_1
6.0GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:q4_k_s
5.5GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:q4_k_m
5.8GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:q5_0
6.5GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:q5_1
7.0GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:q5_k_s
6.5GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:q5_k_m
6.6GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:q6_k
7.6GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:q8_0
9.8GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:iq2_xxs
2.8GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:iq2_xs
3.1GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:iq2_s
3.2GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:iq3_xxs
3.8GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:iq3_xs
4.1GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:iq3_s
4.3GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:iq4_xs
5.2GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:iq4_nl
5.4GB · 8K context window · Text · 1 year ago
gemma2-9b-sppo-iter3:fp16
18GB · 8K context window · Text · 1 year ago
calibration_datav3.txt
Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675)
This model was developed using Self-Play Preference Optimization at iteration 3, based on the google/gemma-2-9b-it architecture as starting point. We utilized the prompt sets from the openbmb/UltraFeedback dataset, splited to 3 parts for 3 iterations by snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset. All responses used are synthetic.
Model | LC. Win Rate | Win Rate | Avg. Length |
---|---|---|---|
Gemma-2-9B-SPPO Iter1 | 48.70 | 40.76 | 1669 |
Gemma-2-9B-SPPO Iter2 | 50.93 | 44.64 | 1759 |
Gemma-2-9B-SPPO Iter3 | 53.27 | 47.74 | 1803 |
The following hyperparameters were used during training:
@misc{wu2024self,
title={Self-Play Preference Optimization for Language Model Alignment},
author={Wu, Yue and Sun, Zhiqing and Yuan, Huizhuo and Ji, Kaixuan and Yang, Yiming and Gu, Quanquan},
year={2024},
eprint={2405.00675},
archivePrefix={arXiv},
primaryClass={cs.LG}
}