October 15 — Alibaba’s Tongyi Qianwen team today launched its strongest vision-language models yet: Qwen3-VL-4B and Qwen3-VL-8B. Both sizes come in “Instruct” and “Thinking” variants, outperforming top-tier models like Gemini 2.5 Flash Lite and GPT-5 Nano across dozens of authoritative benchmarks.
Key Advantages:
-
Lower Resource Barriers Reduced model size significantly cuts VRAM usage, enabling deployment on broader hardware—including 16GB Macs.
-
Full Core Capabilities Maintains all key features of the flagship Qwen3-VL despite compact sizing.
-
Benchmark Dominance Surpasses Gemini 2.5 Flash Lite and GPT-5 Nano in STEM, VQA, OCR, video understanding, and agent tasks—even rivaling Alibaba’s own 72B model from six months ago.
As shown in the figure below, in terms of multimodal performance, Qwen3-VL-8B Instruct achieved SOTA (industry best) results in 30 authoritative benchmark evaluations, including MIABench, OCRBench, SUNRGBD, ERQA, VideoMMMU, and ScreenSpot, surpassing top models such as Gemini 2.5 Flash Lite, GPT-5 Nano, and Qwen2.5-VL-72B.
Performance Highlights
Multimodal Mastery
-
Qwen3-VL-8B Instruct achieved SOTA (state-of-the-art) in 30+ benchmarks including MIABench, OCRBench, and VideoMMMU.
-
Qwen3-VL-4B Instruct competes head-to-head with Gemini/OpenAI rivals despite fewer parameters.
Text & Reasoning Prowess Both models show improved text performance over predecessors:
Advanced “Thinking” Variants
-
Qwen3-VL-8B Thinking claimed SOTA in 23 benchmarks like MathVision and HallusionBench.
-
Qwen3-VL-4B Thinking continues the “small but mighty” trend.
Global Impact
Since its September 24 open-source release, Qwen3-VL has rapidly gained recognition:
-
🥈 #2 on Chatbot Arena’s Vision Leaderboard (top open-source vision model)
-
🥇 #1 in OpenRouter’s image processing category (48% market share)

-
🌍 First open-source model to lead in both text and vision arenas

Developers worldwide celebrated the news:
“Finally! Runs on my 16GB Mac!””Been waiting for this!”
Resources
Source: X, Hugging Face, Tongyi Qianwen Official

