Tag · quantization

# quantization

All posts tagged "quantization".

Gemma 4 12B QAT lands — +17% speed, −39% VRAM, 65K context on 24 GB consumer Blackwell

05.06.2026

Google released the QAT (Quantization-Aware Training) variants of Gemma 4 today at 1pm UTC. Three hours later, Olares One is running on them. On the 12B: 102.78 t/s vs 87.5 baseline = +17.4% speed. 8.6 GB VRAM vs ~14 GB = −39%. Context 32K → 65K with margin to spare. Tool calling intact, vision intact (modulo an mmproj gotcha I explain below).
Lire →