aquif-moe-400m
aquif-moe-400m is our compact Mixture of Experts (MoE) model, with only 400 million active parameters. It offers impressive performance-per-VRAM efficiency, making it a strong choice for resource-limited setups.
Model Overview
- Name:
aquif-moe-400m
- Parameters: 400 million active parameters (1.3 billion total)
- Context Window: 128,000 tokens
- Architecture: Mixture of Experts (MoE)
- Type: General-purpose LLM
- Hosted on: Ollama, Huggingface
Key Features
- Highly efficient VRAM utilization (77.0 performance points per GB)
- Expansive 128K token context window for handling long documents
- Competitive performance despite fewer parameters
- Optimized for local inference on consumer hardware
- Ideal for resource-constrained environments
- Supports high-throughput concurrent sessions
Performance Benchmarks
aquif-moe-400m delivers solid performance across multiple benchmarks, especially for its size:
Benchmark |
aquif-moe (0.4b) |
Qwen 2.5 (0.5b) |
Gemma 3 (1b) |
MMLU |
26.6 |
45.4 |
26.5 |
HumanEval |
32.3 |
22.0 |
8.1 |
GSM8K |
33.9 |
36.0 |
6.1 |
Average |
30.9 |
34.4 |
11.3 |
VRAM Efficiency
aquif-moe-400m excels in VRAM efficiency:
Model |
Average Performance |
VRAM (GB) |
Performance per VRAM |
aquif-moe |
30.9 |
0.4 |
77.3 |
Qwen 2.5 |
34.4 |
0.6 |
57.3 |
Gemma 3 |
11.3 |
1.0 |
11.3 |
Use Cases
- Edge computing and resource-constrained environments
- Mobile and embedded applications
- Local development environments
- Quick prototyping and testing
- Personal assistants on consumer hardware
- Enterprise deployment with multiple concurrent sessions
- Long document analysis and summarization
- High-throughput production environments
Limitations
- No thinking mode capability
- May show hallucinations in some areas
- May struggle with more complex reasoning tasks
- Not optimized for specialized domains
Getting Started
To run via Ollama:
ollama run aquiffoo/aquif-moe-400m