1,599 2 months ago

vision

2 months ago

5eea018fd62e · 4.1GB ·

qwen2vl
·
3.09B
·
Q8_0
clip
·
669M
·
Q8_0
{{- if .System -}} <|im_start|>system {{ .System }}<|im_end|> {{- end -}} {{- range $i, $_ := .Messa
{ "temperature": 0.0001 }

Readme

yasserrmd/Nanonets-OCR2-3B (8-bit)

Base model: nanonets/Nanonets-OCR2-3B Type: Multimodal OCR & document understanding (images → structured text, tables, LaTeX, captions). Precision: 8-bit quantized for efficient inference. Params: ~3B Format: GGUF / Ollama compatible


⚙️ Usage (Ollama)

ollama pull yasserrmd/Nanonets-OCR2-3B:q8_0
ollama run yasserrmd/Nanonets-OCR2-3B:q8_0

Example prompt:

Extract all text, tables, and equations from the uploaded document image.
Return tables in HTML and equations in LaTeX.

You can also use it via API:

import requests
requests.post("http://localhost:11434/api/generate",
              json={"model":"yasserrmd/Nanonets-OCR2-3B:q8_0",
                    "prompt":"<your prompt here>"})

📘 Notes

  • Original documentation, evaluation, and architecture: Hugging Face Model Page →
  • Use high-resolution input images for better OCR accuracy.
  • Quantization improves performance with minimal quality loss.
  • Best suited for document parsing, forms, and scanned PDFs.