Tag · tool-calling
All posts tagged "tool-calling".
Two days ago I shipped Qwen 3.6 35B-A3B MTP at 249 t/s on Olares One. Text-only, but the new champion. Today the same hardware runs Gemma 4 26B at 250 t/s with vision and tool calling. The unlock: vLLM v0.21 quietly merged the official Google Gemma 4 MTP drafter. No more 5-fast/4-slow cycle bug from DFlash. No more 135 t/s no-spec fallback. Just full speed, plus images.