Tag · gemma4
All posts tagged "gemma4".
Full num_speculative_tokens sweep for Gemma 4 26B-A4B + z-lab DFlash drafter on RTX 5090M Laptop (24GB sm_120). Optimal is n_spec=8 (not n=15 like desktop). I also found a 100% reproducible vLLM degradation cycle that I couldn't fix from config alone.