Notes from Sam:
This model has been quantized from the fp16 model using ollama. I’ve provided it in Q4_K_S (default) and Q5_K_M. These quants were not uploaded by the ollama team in the main model tags, but I find them useful, so I wanted people to have easy access. I’ve also set these two parameters as default:
Temperature: 0.3 - this was on a whim
Context Length: 512 tokens - I’m in VRAM poverty and these phi models are only getting less and less efficient
Obviously you can change these settings easily by using a modelfile, or just setting the parameters manually in your UI of choice.
Phi-4-mini-instruct Model Summary
Phi-4-mini-instruct is a lightweight, open-source language model with 3.8 billion parameters, developed by Microsoft. It’s designed for broad commercial and research use, particularly in memory/compute-constrained environments and latency-bound scenarios, with a focus on strong reasoning capabilities (especially in math and logic).
Key Features:
- Training Data: Trained on 5T tokens, consisting of synthetic data, filtered publicly available websites, high-quality educational data, and code. The data emphasizes high-quality, reasoning-dense information.
- Architecture: Dense decoder-only Transformer model with a 128K token context length, 200K vocabulary, grouped-query attention, and shared input/output embedding.
- Intended Use: General-purpose AI systems and applications requiring strong reasoning, especially math and logic. It can be used as a building block for generative AI-powered features.
- Enhanced Capabilities: Includes supervised fine-tuning and direct preference optimization for instruction adherence and safety.
- Multilingual Support: Supports multiple languages, including English, Chinese, Spanish, and others.
- Performance: Achieves a similar level of multilingual language understanding and reasoning ability as much larger models on various benchmarks.
- Input Formats: Best suited for prompts using a specific chat format or a tool-enabled function-calling format.
- Inference: Can be used with vLLM or Transformers.
- Responsible AI Considerations: Developers should consider limitations such as potential for unfairness, unreliability, offensive content, and factual inaccuracies, and apply responsible AI best practices.
Release Notes:
This release incorporates user feedback from the Phi-3 series and introduces a new architecture for efficiency, a larger vocabulary for multilingual support, and improved post-training techniques for instruction following and function calling.
Citations:
HuggingFace