aquiffoo/

aquif-moe-400m:latest

96 Downloads Updated 4 months ago

Our smallest model is also Mixture of Experts. Having a total of 0.4b active and 1.3b total parameters, it beats models way bigger than it in efficiency and performance.

tools

Updated 4 months ago

4 months ago

53f3c7f01b35 · 1.4GB ·

archgranitemoe

·

parameters1.33B

·

quantizationQ8_0

1.4GB

<|start_of_role|>system<|end_of_role|> {{- if and (gt (len .Messages) 0) (eq (index .Messages 0).Rol

1.4kB

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

11kB

You are aquif-moe, specifically the 400 million active parameter variant, an empathetic AI copilot d

673B

{ "temperature": 0.7 }

20B

Readme

aquif-moe-400m

aquif-moe-400m is our compact Mixture of Experts (MoE) model, with only 400 million active parameters. It offers impressive performance-per-VRAM efficiency, making it a strong choice for resource-limited setups.

Model Overview

Name: aquif-moe-400m
Parameters: 400 million active parameters (1.3 billion total)
Context Window: 128,000 tokens
Architecture: Mixture of Experts (MoE)
Type: General-purpose LLM
Hosted on: Ollama, Huggingface

Key Features

Highly efficient VRAM utilization (77.0 performance points per GB)
Expansive 128K token context window for handling long documents
Competitive performance despite fewer parameters
Optimized for local inference on consumer hardware
Ideal for resource-constrained environments
Supports high-throughput concurrent sessions

Performance Benchmarks

aquif-moe-400m delivers solid performance across multiple benchmarks, especially for its size:

Benchmark	aquif-moe (0.4b)	Qwen 2.5 (0.5b)	Gemma 3 (1b)
MMLU	26.6	45.4	26.5
HumanEval	32.3	22.0	8.1
GSM8K	33.9	36.0	6.1
Average	30.9	34.4	11.3

VRAM Efficiency

aquif-moe-400m excels in VRAM efficiency:

Model	Average Performance	VRAM (GB)	Performance per VRAM
aquif-moe	30.9	0.4	77.3
Qwen 2.5	34.4	0.6	57.3
Gemma 3	11.3	1.0	11.3

Use Cases

Edge computing and resource-constrained environments
Mobile and embedded applications
Local development environments
Quick prototyping and testing
Personal assistants on consumer hardware
Enterprise deployment with multiple concurrent sessions
Long document analysis and summarization
High-throughput production environments

Limitations

No thinking mode capability
May show hallucinations in some areas
May struggle with more complex reasoning tasks
Not optimized for specialized domains

Getting Started

To run via Ollama:

ollama run aquiffoo/aquif-moe-400m

# aquif-moe-400m

**aquif-moe-400m** is our compact Mixture of Experts (MoE) model, with only 400 million active parameters. It offers impressive performance-per-VRAM efficiency, making it a strong choice for resource-limited setups.

## Model Overview

- **Name**: `aquif-moe-400m`
- **Parameters**: 400 million active parameters (1.3 billion total)
- **Context Window**: 128,000 tokens
- **Architecture**: Mixture of Experts (MoE)
- **Type**: General-purpose LLM
- **Hosted on**: Ollama, Huggingface

## Key Features

- Highly efficient VRAM utilization (77.0 performance points per GB)
- Expansive 128K token context window for handling long documents
- Competitive performance despite fewer parameters
- Optimized for local inference on consumer hardware
- Ideal for resource-constrained environments
- Supports high-throughput concurrent sessions

## Performance Benchmarks

aquif-moe-400m delivers solid performance across multiple benchmarks, especially for its size:

| Benchmark | aquif-moe (0.4b) | Qwen 2.5 (0.5b) | Gemma 3 (1b) |
| --- | --- | --- | --- |
| **MMLU** | 26.6 | 45.4 | 26.5 |
| **HumanEval** | 32.3 | 22.0 | 8.1 |
| **GSM8K** | 33.9 | 36.0 | 6.1 |
| **Average** | 30.9 | 34.4 | 11.3 |

## VRAM Efficiency

aquif-moe-400m excels in VRAM efficiency:

| Model | Average Performance | VRAM (GB) | Performance per VRAM |
| --- | --- | --- | --- |
| **aquif-moe** | 30.9 | 0.4 | 77.3 |
| **Qwen 2.5** | 34.4 | 0.6 | 57.3 |
| **Gemma 3** | 11.3 | 1.0 | 11.3 |

## Use Cases

- Edge computing and resource-constrained environments
- Mobile and embedded applications
- Local development environments
- Quick prototyping and testing
- Personal assistants on consumer hardware
- Enterprise deployment with multiple concurrent sessions
- Long document analysis and summarization
- High-throughput production environments

## Limitations

- No thinking mode capability
- May show hallucinations in some areas
- May struggle with more complex reasoning tasks
- Not optimized for specialized domains

## Getting Started

To run via Ollama:

``
ollama run aquiffoo/aquif-moe-400m
``

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)