Tag · upstream

# upstream

All posts tagged "upstream".

Gemma 4 audio E4B hits 288 t/s — the second upstream merge closes the family

09.06.2026

Yesterday I shipped Gemma 4 12B at 170 t/s via the upstream PR #23398 merge. Today PR #24282 (the E2B/E4B counterpart) merged. Custom rebuild, chart swap, bench: Gemma 4 audio E4B jumps from 47 t/s to 288 t/s. 6.1x speedup on the same hardware in 5 minutes of config. With a flash-attention trap on the way — the combo Gemma 4 E4B + audio mmproj + MTP draft crashes the CUDA flash attention kernel, no-FA fallback unlocks everything.
Lire →
Gemma 4 12B hits 170 t/s — upstream merge buys +67% speed for free

08.06.2026

Two days ago I shipped Gemma 4 12B QAT at 102 t/s on Olares One. Today I ship 170 t/s. Same hardware. Same model file. Same drafter. Same context. The delta: am17an's PR #23398 (Gemma 4 MTP support) merged into llama.cpp upstream at 12:50 UTC. My custom image — a snapshot of the WIP branch at commit dd97604 — was missing 10+ polish commits that ggerganov forced in review. +67% speed on the exact same setup, just by rebasing. Bonus: critical insight on Olares One's nvidia driver capping CUDA at 13.1, blocking the whole upstream Docker ecosystem.
Lire →