Tag · audio
All posts tagged "audio".
Yesterday I shipped Gemma 4 12B at 170 t/s via the upstream PR #23398 merge. Today PR #24282 (the E2B/E4B counterpart) merged. Custom rebuild, chart swap, bench: Gemma 4 audio E4B jumps from 47 t/s to 288 t/s. 6.1x speedup on the same hardware in 5 minutes of config. With a flash-attention trap on the way — the combo Gemma 4 E4B + audio mmproj + MTP draft crashes the CUDA flash attention kernel, no-FA fallback unlocks everything.