Blog · Ollama

OpenAI gpt-oss-safeguard

October 29, 2025

Ollama is partnering with OpenAI and ROOST (Robust Open Online Safety Tools) to bring the latest gpt-oss-safeguard reasoning models to users for safety classification tasks. gpt-oss-safeguard models are available in two sizes: 20B and 120B, and are permissively licensed under the Apache 2.0 license.

MiniMax M2

October 28, 2025

MiniMax M2 is now available on Ollama's cloud. It's a model built for coding and agentic workflows.

NVIDIA DGX Spark performance

October 23, 2025

We ran performance tests on release day firmware and an updated Ollama version to see how Ollama performs.

New coding models & integrations

October 16, 2025

GLM-4.6 and Qwen3-coder-480B are available on Ollama’s cloud service with easy integrations to the tools you are familiar with. Qwen3-Coder-30B has been updated for faster, more reliable tool calling in Ollama’s new engine.

Qwen3-VL

October 14, 2025

Ollama now supports Alibaba's Qwen3-VL.

NVIDIA DGX Spark

October 13, 2025

The latest NVIDIA DGX Spark is here! Ollama has partnered with NVIDIA to ensure it runs fast and efficiently out-of-the-box.

Web search

September 24, 2025

A new web search API is now available in Ollama. Ollama provides a generous free tier of web searches for individuals to use, and higher rate limits are available via Ollama’s cloud.

New model scheduling

September 23, 2025

Ollama now includes a significantly improved model scheduling system, reducing crashes due to out of memory issues, maximizing GPU utilization and performance, especially on multi-GPU systems.

Cloud models

September 19, 2025

Cloud models are now in preview, letting you run larger models with fast, datacenter-grade hardware. You can keep using your local tools while running larger models that wouldn’t fit on a personal computer.

OpenAI gpt-oss

August 5, 2025

Ollama partners with OpenAI to bring gpt-oss to Ollama and its community.

Ollama's new app

July 30, 2025

Ollama's new app is now available for macOS and Windows.

Secure Minions: private collaboration between Ollama and frontier models

June 3, 2025

Secure Minions is a secure protocol built by Stanford's Hazy Research lab to allow encrypted local-remote communication.

Thinking

May 30, 2025

Ollama now has the ability to enable or disable thinking. This gives users the flexibility to choose the model’s thinking behavior for different applications and use cases.

Streaming responses with tool calling

May 28, 2025

Ollama now supports streaming responses with tool calling. This enables all chat applications to stream content and also call tools in real time.

Ollama's new engine for multimodal models

May 15, 2025

Ollama now supports new multimodal models with its new engine.

Minions: where local and cloud LLMs meet

February 25, 2025

Avanika Narayan, Dan Biderman, and Sabri Eyuboglu from Christopher Ré's Stanford Hazy Research lab, along with Avner May, Scott Linderman, James Zou, have developed a way to shift a substantial portion of LLM workloads to consumer devices by having small on-device models (such as Llama 3.2 with Ollama) collaborate with larger models in the cloud (such as GPT-4o).

Structured outputs

December 6, 2024

Ollama now supports structured outputs making it possible to constrain a model's output to a specific format defined by a JSON schema. The Ollama Python and JavaScript libraries have been updated to support structured outputs.

Ollama Python library 0.4 with function calling improvements

November 25, 2024

With Ollama Python library version 0.4, functions can now be provided as tools. The library now also has full typing support and new examples have been added.

Llama 3.2 Vision

November 6, 2024

Llama 3.2 Vision 11B and 90B models are now available in Ollama.

IBM Granite 3.0 models

October 21, 2024

Ollama partners with IBM to bring Granite 3.0 models to Ollama.

Llama 3.2 goes small and multimodal

September 25, 2024

Ollama partners with Meta to bring Llama 3.2 to Ollama.

Reduce hallucinations with Bespoke-Minicheck

September 18, 2024

Bespoke-Minicheck is a new grounded factuality checking model developed by Bespoke Labs that is now available in Ollama. It can fact-check responses generated by other models to detect and reduce hallucinations.

Tool support

July 25, 2024

Ollama now supports tool calling with popular models such as Llama 3.1. This enables a model to answer a given prompt using tool(s) it knows about, making it possible for models to perform more complex tasks or interact with the outside world.

Google Gemma 2

June 27, 2024

Gemma 2 is now available on Ollama in 3 sizes - 2B, 9B and 27B.

An entirely open-source AI code assistant inside your editor

May 31, 2024

Continue enables you to easily create your own coding assistant directly inside Visual Studio Code and JetBrains with open-source LLMs.

Google announces Firebase Genkit with Ollama support

May 20, 2024

At Google IO 2024, Google announced Ollama support in Firebase Genkit, a new open-source framework for developers to build, deploy and monitor production-ready AI-powered apps.

Llama 3 is not very censored

April 19, 2024

Compared to Llama 2, Llama 3 feels much less censored. Meta has substantially lowered false refusal rates. Llama 3 will refuse less than 1/3 of the prompts previously refused by Llama 2.

Llama 3

April 18, 2024

Llama 3 is now available to run on Ollama. This model is the next generation of Meta's state-of-the-art large language model, and is the most capable openly available LLM to date.

Embedding models

April 8, 2024

Embedding models are available in Ollama, making it easy to generate vector embeddings for use in search and retrieval augmented generation (RAG) applications.

Ollama now supports AMD graphics cards

March 14, 2024

Ollama now supports AMD graphics cards in preview on Windows and Linux. All the features of Ollama can now be accelerated by AMD graphics cards on Ollama for Linux and Windows.

Windows preview

February 15, 2024

Ollama is now available on Windows in preview, making it possible to pull, run and create large language models in a new native Windows experience. Ollama on Windows includes built-in GPU acceleration, access to the full model library, and serves the Ollama API including OpenAI compatibility.

OpenAI compatibility

February 8, 2024

Ollama now has initial compatibility with the OpenAI Chat Completions API, making it possible to use existing tooling built for OpenAI with local models via Ollama.

Vision models

February 2, 2024

New vision models are now available: LLaVA 1.6, in 7B, 13B and 34B parameter sizes. These models support higher resolution images, improved text recognition and logical reasoning.

Python & JavaScript Libraries

January 23, 2024

The initial versions of the Ollama Python and JavaScript libraries are now available, making it easy to integrate your Python or JavaScript, or Typescript app with Ollama in a few lines of code. Both libraries include all the features of the Ollama REST API, are familiar in design, and compatible with new and previous versions of Ollama.