Tag · vision

# vision

All posts tagged "vision".

Gemma 4 12B QAT lands — +17% speed, −39% VRAM, 65K context on 24 GB consumer Blackwell

05.06.2026

Google released the QAT (Quantization-Aware Training) variants of Gemma 4 today at 1pm UTC. Three hours later, Olares One is running on them. On the 12B: 102.78 t/s vs 87.5 baseline = +17.4% speed. 8.6 GB VRAM vs ~14 GB = −39%. Context 32K → 65K with margin to spare. Tool calling intact, vision intact (modulo an mmproj gotcha I explain below).
Lire →
Vision unlocked on Qwen3.6 35B-A3B MTP — 243 t/s + 262K context + image input via spiritbuun's --mmproj-gpu-swap

24.05.2026

Three days ago I shipped Qwen3.6 35B-A3B MTP at 249 t/s text-only on Olares One — the new champion. Yesterday I shipped Gemma 4 26B at 250 t/s with vision. Today the Qwen champion gets vision too. Same 24 GB GPU. Same model file. The unlock: spiritbuun merged a feature called --mmproj-gpu-swap on May 22 that hot-swaps MTP and the vision encoder in VRAM on-demand. Trade-off: -2.8% text throughput, +full vision support, +4× more context vs my v1.0.5 vision attempt.
Lire →
Gemma 4 26B Vision at 250 t/s — vLLM v0.21 closed the gap with my text-only champion

23.05.2026

Two days ago I shipped Qwen 3.6 35B-A3B MTP at 249 t/s on Olares One. Text-only, but the new champion. Today the same hardware runs Gemma 4 26B at 250 t/s with vision and tool calling. The unlock: vLLM v0.21 quietly merged the official Google Gemma 4 MTP drafter. No more 5-fast/4-slow cycle bug from DFlash. No more 135 t/s no-spec fallback. Just full speed, plus images.
Lire →
Gemma 4 26B-A4B vision via vLLM — 135 t/s at 128K for an office workhorse on 24 GB

15.05.2026

An Olares One peer user shared a Discord patch to restore vision on the gemma426ba4bone chart. 24 hours later, I shipped a vLLM variant hitting 135 t/s at 128K context — and the same user validated it in production. The story of a community-driven engineering loop, four llama.cpp configs benched in parallel, and the moment turbo3 KV stopped being the answer.
Lire →
BeeLlama Qwen3.6 27B with vision — 106 t/s at 200K on consumer Blackwell mobile

15.05.2026

Followup to last night's BeeLlama text-only 262K post — added the mmproj vision projector to Qwen3.6 27B, expected a perf hit, got a counter-intuitive surprise. BeeLlama supports vision + DFlash spec decoding together (which crashes on Gemma 4). And 200K context outperforms 128K by 4.4%. First public sm_120 BeeLlama vision bench.
Lire →

Gemma 4 12B QAT lands — +17% speed, −39% VRAM, 65K context on 24 GB consumer Blackwell

Vision unlocked on Qwen3.6 35B-A3B MTP — 243 t/s + 262K context + image input via spiritbuun's --mmproj-gpu-swap

Gemma 4 26B Vision at 250 t/s — vLLM v0.21 closed the gap with my text-only champion

Gemma 4 26B-A4B vision via vLLM — 135 t/s at 128K for an office workhorse on 24 GB

BeeLlama Qwen3.6 27B with vision — 106 t/s at 200K on consumer Blackwell mobile