53 Downloads Updated 2 weeks ago
Updated 2 weeks ago
2 weeks ago
3c0f13a769d7 · 17GB ·
MARS / I-MATRIX / 28B / I-QUANT
Relatively new model as of November 2025, with claims of beating Mistral 24b, as well as interpreting complex character cards. To stuff as many parameters in as little VRAM as possible, weighted K and I-quants will be listed, along with multiple distillations as the original creator of the model has offered many.
Note that I-quants forfeit some token generation speed relative to K-quants in exchange for storage efficiency. The creator of this model has mentioned this model being heavy on memory usage, specifically not being able to fit the model into VRAM at only 8k context with the 4-bit small quantization.
Although said quantization is considered the ‘standard’ when running roleplay models locally, 3-bit models are become increasingly usable at higher parameters compared to 4-bit models - to the point that the 3-bit small quantization should be performant at 28b (above 18b, from my experience). The importance matrix and I-quants included will see to remedying that regardless.
The small 3-bit quants, is recommended for 16GB GPUs. These models were taken from GGUF formats from Huggingface.
GGUF weighted quantizations (mradermacher):
[No obligatory model picture. Mars did not have one.]