InternVL3 is a Qwen2.5 based multimodal large language model from OpenGVLab that represents a significant advancement over its predecessor, InternVL 2.5.

InternVL3 Summary

InternVL3 is a new multimodal large language model that represents a significant advancement over its predecessor, InternVL 2.5.

Key Improvements

Enhanced Core Capabilities

Superior multimodal perception and reasoning
Better overall text performance than comparable models like Qwen2.5 Chat

Expanded Functionality

Tool usage integration
GUI agent capabilities
Industrial image analysis
3D vision perception
Additional multimodal applications

Technical Innovation

The model benefits from Native Multimodal Pre-Training, which allows it to outperform even the Qwen2.5 series in text tasks, despite using Qwen2.5’s pre-trained base models as initialization for its language component.

Bottom Line

InternVL3 pushes the boundaries of what multimodal AI can do by combining stronger foundational capabilities with a broader range of practical applications across visual, textual, and interactive domains.