155 Downloads Updated 1 year ago
Name
13 models
starling-lm-10.7b:latest
6.1GB · 8K context window · Text · 1 year ago
starling-lm-10.7b:q2_K
4.0GB · 8K context window · Text · 1 year ago
starling-lm-10.7b:q3_K_S
4.7GB · 8K context window · Text · 1 year ago
starling-lm-10.7b:q3_K_M
5.2GB · 8K context window · Text · 1 year ago
starling-lm-10.7b:q3_K_L
5.7GB · 8K context window · Text · 1 year ago
starling-lm-10.7b:q4_0
6.1GB · 8K context window · Text · 1 year ago
starling-lm-10.7b:q4_K_S
6.1GB · 8K context window · Text · 1 year ago
starling-lm-10.7b:q4_K_M
6.5GB · 8K context window · Text · 1 year ago
starling-lm-10.7b:q5_0
7.4GB · 8K context window · Text · 1 year ago
starling-lm-10.7b:q5_K_S
7.4GB · 8K context window · Text · 1 year ago
starling-lm-10.7b:q5_K_M
7.6GB · 8K context window · Text · 1 year ago
starling-lm-10.7b:q6_K
8.8GB · 8K context window · Text · 1 year ago
starling-lm-10.7b:q8_0
11GB · 8K context window · Text · 1 year ago
This is Starling-LM-10.7B-beta, a depth-upscaled version of Nexusflow/Starling-LM-7B-beta.
This model is intended to be used as a drop-in upgrade from the original 7 billion parameter model.
We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from Openchat-3.5-0106 with our new reward model Nexusflow/Starling-RM-34B and policy optimization method Fine-Tuning Language Models from Human Preferences (PPO). Harnessing the power of the ranking dataset, berkeley-nest/Nectar, the upgraded reward model, Starling-RM-34B, and the new reward training and policy tuning pipeline, Starling-LM-7B-beta scores an improved 8.12 in MT Bench with GPT-4 as a judge.
Important: The model output can be verbose in rare cases. Please consider setting temperature = 0 to make this happen less. Default temperature is set to 0.1
@HuggingFace https://huggingface.co/bartowski/Starling-LM-10.7B-beta-GGUF