230 6 months ago

Phi-4-mini is the latest small LLM from microsoft. These are the quants that I like to use, that weren't uploaded to the main model by the ollama team.

tools 3.8b

Models

View all →

Readme

Notes from Sam:

This model has been quantized from the fp16 model using ollama. I’ve provided it in Q4_K_S (default) and Q5_K_M. These quants were not uploaded by the ollama team in the main model tags, but I find them useful, so I wanted people to have easy access. I’ve also set these two parameters as default:

Temperature: 0.3 - this was on a whim

Context Length: 512 tokens - I’m in VRAM poverty and these phi models are only getting less and less efficient

Obviously you can change these settings easily by using a modelfile, or just setting the parameters manually in your UI of choice.


Phi-4-mini-instruct Model Summary

Phi-4-mini-instruct is a lightweight, open-source language model with 3.8 billion parameters, developed by Microsoft. It’s designed for broad commercial and research use, particularly in memory/compute-constrained environments and latency-bound scenarios, with a focus on strong reasoning capabilities (especially in math and logic).

Key Features:

  • Training Data: Trained on 5T tokens, consisting of synthetic data, filtered publicly available websites, high-quality educational data, and code. The data emphasizes high-quality, reasoning-dense information.
  • Architecture: Dense decoder-only Transformer model with a 128K token context length, 200K vocabulary, grouped-query attention, and shared input/output embedding.
  • Intended Use: General-purpose AI systems and applications requiring strong reasoning, especially math and logic. It can be used as a building block for generative AI-powered features.
  • Enhanced Capabilities: Includes supervised fine-tuning and direct preference optimization for instruction adherence and safety.
  • Multilingual Support: Supports multiple languages, including English, Chinese, Spanish, and others.
  • Performance: Achieves a similar level of multilingual language understanding and reasoning ability as much larger models on various benchmarks.
  • Input Formats: Best suited for prompts using a specific chat format or a tool-enabled function-calling format.
  • Inference: Can be used with vLLM or Transformers.
  • Responsible AI Considerations: Developers should consider limitations such as potential for unfairness, unreliability, offensive content, and factual inaccuracies, and apply responsible AI best practices.

Release Notes:

This release incorporates user feedback from the Phi-3 series and introduces a new architecture for efficiency, a larger vocabulary for multilingual support, and improved post-training techniques for instruction following and function calling.

Citations:

HuggingFace