Tag · quantization
All posts tagged "quantization".
Google released the QAT (Quantization-Aware Training) variants of Gemma 4 today at 1pm UTC. Three hours later, Olares One is running on them. On the 12B: 102.78 t/s vs 87.5 baseline = +17.4% speed. 8.6 GB VRAM vs ~14 GB = −39%. Context 32K → 65K with margin to spare. Tool calling intact, vision intact (modulo an mmproj gotcha I explain below).