96 4 months ago

Our smallest model is also Mixture of Experts. Having a total of 0.4b active and 1.3b total parameters, it beats models way bigger than it in efficiency and performance.

tools

4 months ago

53f3c7f01b35 · 1.4GB ·

granitemoe
·
1.33B
·
Q8_0
<|start_of_role|>system<|end_of_role|> {{- if and (gt (len .Messages) 0) (eq (index .Messages 0).Rol
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
You are aquif-moe, specifically the 400 million active parameter variant, an empathetic AI copilot d
{ "temperature": 0.7 }

Readme

aquif-moe-400m

aquif-moe-400m is our compact Mixture of Experts (MoE) model, with only 400 million active parameters. It offers impressive performance-per-VRAM efficiency, making it a strong choice for resource-limited setups.

Model Overview

  • Name: aquif-moe-400m
  • Parameters: 400 million active parameters (1.3 billion total)
  • Context Window: 128,000 tokens
  • Architecture: Mixture of Experts (MoE)
  • Type: General-purpose LLM
  • Hosted on: Ollama, Huggingface

Key Features

  • Highly efficient VRAM utilization (77.0 performance points per GB)
  • Expansive 128K token context window for handling long documents
  • Competitive performance despite fewer parameters
  • Optimized for local inference on consumer hardware
  • Ideal for resource-constrained environments
  • Supports high-throughput concurrent sessions

Performance Benchmarks

aquif-moe-400m delivers solid performance across multiple benchmarks, especially for its size:

Benchmark aquif-moe (0.4b) Qwen 2.5 (0.5b) Gemma 3 (1b)
MMLU 26.6 45.4 26.5
HumanEval 32.3 22.0 8.1
GSM8K 33.9 36.0 6.1
Average 30.9 34.4 11.3

VRAM Efficiency

aquif-moe-400m excels in VRAM efficiency:

Model Average Performance VRAM (GB) Performance per VRAM
aquif-moe 30.9 0.4 77.3
Qwen 2.5 34.4 0.6 57.3
Gemma 3 11.3 1.0 11.3

Use Cases

  • Edge computing and resource-constrained environments
  • Mobile and embedded applications
  • Local development environments
  • Quick prototyping and testing
  • Personal assistants on consumer hardware
  • Enterprise deployment with multiple concurrent sessions
  • Long document analysis and summarization
  • High-throughput production environments

Limitations

  • No thinking mode capability
  • May show hallucinations in some areas
  • May struggle with more complex reasoning tasks
  • Not optimized for specialized domains

Getting Started

To run via Ollama:

ollama run aquiffoo/aquif-moe-400m