Tag · mmproj

# mmproj

All posts tagged "mmproj".

Vision unlocked on Qwen3.6 35B-A3B MTP — 243 t/s + 262K context + image input via spiritbuun's --mmproj-gpu-swap

24.05.2026

Three days ago I shipped Qwen3.6 35B-A3B MTP at 249 t/s text-only on Olares One — the new champion. Yesterday I shipped Gemma 4 26B at 250 t/s with vision. Today the Qwen champion gets vision too. Same 24 GB GPU. Same model file. The unlock: spiritbuun merged a feature called --mmproj-gpu-swap on May 22 that hot-swaps MTP and the vision encoder in VRAM on-demand. Trade-off: -2.8% text throughput, +full vision support, +4× more context vs my v1.0.5 vision attempt.
Lire →
Gemma 4 26B-A4B vision via vLLM — 135 t/s at 128K for an office workhorse on 24 GB

15.05.2026

An Olares One peer user shared a Discord patch to restore vision on the gemma426ba4bone chart. 24 hours later, I shipped a vLLM variant hitting 135 t/s at 128K context — and the same user validated it in production. The story of a community-driven engineering loop, four llama.cpp configs benched in parallel, and the moment turbo3 KV stopped being the answer.
Lire →
BeeLlama Qwen3.6 27B with vision — 106 t/s at 200K on consumer Blackwell mobile

15.05.2026

Followup to last night's BeeLlama text-only 262K post — added the mmproj vision projector to Qwen3.6 27B, expected a perf hit, got a counter-intuitive surprise. BeeLlama supports vision + DFlash spec decoding together (which crashes on Gemma 4). And 200K context outperforms 128K by 4.4%. First public sm_120 BeeLlama vision bench.
Lire →

Vision unlocked on Qwen3.6 35B-A3B MTP — 243 t/s + 262K context + image input via spiritbuun's --mmproj-gpu-swap

Gemma 4 26B-A4B vision via vLLM — 135 t/s at 128K for an office workhorse on 24 GB

BeeLlama Qwen3.6 27B with vision — 106 t/s at 200K on consumer Blackwell mobile