The first ever open weights model from Aqui Solutions, the company behind AquiGPT, and also the first Mistral-based Aqui-VL model.

Aqui-VL 24B Mistral

Aqui-VL 24B Mistral is an advanced language model based on Mistral Small 3.1, designed to deliver exceptional performance while remaining accessible on consumer-grade hardware. This is the first open weights model from Aqui Solutions, the company behind AquiGPT. With 23.6 billion parameters, it can run efficiently on a single RTX 4090 GPU or a 32GB Mac, making cutting-edge AI capabilities available to researchers, developers, and enthusiasts.

Key Features

Consumer Hardware Compatible: Runs on single RTX 4090 or 32GB Mac
Multimodal Capabilities: Text, vision, chart analysis, and document understanding
128K Context Window: Handle long documents and complex conversations
Strong Instruction Following: Significantly improved over base Mistral Small 3.1
Exceptional Code Generation: Best-in-class coding performance

Hardware Requirements

Minimum Requirements

GPU: RTX 4090 (24GB VRAM) or equivalent
Mac: 32GB unified memory (Apple Silicon recommended)
RAM: 32GB system memory (for GPU setups)
Storage: 20GB available space (for model and overhead)

Recommended Setup

GPU: RTX 4090 with adequate cooling
CPU: Modern multi-core processor
RAM: 64GB+ for optimal performance
Storage: NVMe SSD for faster model loading

Performance Benchmarks

Aqui-VL 24B Mistral demonstrates competitive performance across multiple domains:

Benchmark	Aqui-VL 24B Mistral	Mistral Small 3.1	Llama 3.1 70B
IFEval (Instruction Following)	88.3%	82.6%	87.5%
MMLU (General Knowledge)	80.9%	80.5%	86.0%
GPQA (Science Q&A)	44.7%	44.4%	46.7%
HumanEval (Coding)	92.5%	88.9%	80.5%
MATH (Mathematics)	69.3%	69.5%	68.0%
MMMU (General Vision)	64.0%	62.5%	N/A*
ChartQA (Chart Analysis)	87.6%	86.2%	N/A*
DocVQA (Document Analysis)	94.9%	94.1%	N/A*
Average Text Performance	75.1%	73.2%	73.7%
Average Vision Performance	82.2%	80.9%	N/A*

*Llama 3.1 70B does not include vision capabilities

Model Specifications

Parameters: 23.6 billion
Context Window: 128,000 tokens
Knowledge Cutoff: December 2023
Architecture: mistral (transformer-based with vision)
Languages: Multilingual support with strong English, French and Portuguese performance

Installation & Usage

Quick Start with Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "aquigpt/aqui-vl-24b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate text
prompt = "Explain quantum computing in simple terms:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Using with Ollama

# Pull the model
ollama pull aquiffoo/aqui-vl-24b

# Run interactive chat
ollama run aquiffoo/aqui-vl-24b

Using with llama.cpp

# Download quantized model (Q4_K_M, 14.4GB)
wget https://huggingface.co/aquigpt/aqui-vl-24b/resolve/main/aqui-vl-24b-q4_k_m.gguf

# Run with llama.cpp
./main -m aqui-vl-24b-q4_k_m.gguf -p "Your prompt here" -n 100

Use Cases

Code Generation & Programming

With an 88.9% score on HumanEval, Aqui-VL 24B Mistral excels at: - Writing clean, efficient code in multiple languages - Debugging and code review - Algorithm implementation - Technical documentation

Document & Chart Analysis

Strong vision capabilities enable: - PDF document analysis and Q&A - Chart and graph interpretation - Scientific paper comprehension - Business report analysis

General Assistance

Research and information synthesis
Creative writing and content generation
Mathematical problem solving
Multilingual translation and communication

Quantization

Aqui-VL 24B Mistral is available exclusively in Q4_K_M quantization, optimized for the best balance of performance and hardware compatibility:

Format: Q4_K_M quantization
Size: 14.4GB
VRAM Usage: ~16GB (with overhead)
Compatible with: RTX 4090, 32GB Mac, and similar hardware
Performance: Excellent quality retention with 4-bit quantization

Fine-tuning & Customization

Aqui-VL 24B Mistral supports: - Parameter-efficient fine-tuning (LoRA, QLoRA) - Full fine-tuning for specialized domains - Custom tokenizer training - Multi-modal fine-tuning for specific vision tasks

Limitations

Knowledge cutoff at December 2023
May occasionally produce hallucinations
Performance varies with quantization level
Requires significant computational resources for optimal performance

License

This model is released under the Apache 2.0 License, making it suitable for both research and commercial applications.

Support

For questions and support regarding Aqui-VL 24B Mistral, please visit the Hugging Face repository and use the community discussions section.

Acknowledgments

Built upon the excellent foundation of Mistral Small 3.1 by Mistral AI. Special thanks to the open-source community for tools and datasets that made this model possible.