123 Downloads Updated 3 weeks ago
Updated 3 weeks ago
3 weeks ago
375ead64bcdb · 11GB ·
RP INK / I-MATRIX / 32B / I-QUANT
This model’s received some good praise for it’s creative writing skill. It’s particularly good at following the writing style of the prompts itself. For some reference - if it matters - I’d say it writes better than Magnum V4, even when at a lower bit depth than Magnum. To stuff as many parameters in as little VRAM as possible, weighted I-quants will be listed.
Note that I-quants forfeit some token generation speed relative to K-quants in exchange for storage efficiency. The XXS 3-bit I-quant is recommended for 16GB GPUs that have a well-supported parallel compute platform (ROCm or CUDA). Without such, 16GB GPUs should use Q2_K and below. Full GPU offload at IQ3_S with 16GB of VRAM is possible, but you will be pushing it. These models were taken from GGUF formats from Huggingface.
GGUF weighted quantizations (mradermacher):