Tag · nvidia
All posts tagged "nvidia".
MTP support was merged into llama.cpp master on May 16th. Five days later, three follow-up PRs quietly changed how MTP behaves — including the spec-draft-n-max default flipping from 16 to 3. On Olares One (RTX 5090M sm_120), that change plus NVIDIA's backend-sampling rewrite (#23287) pushed Qwen3.6 27B MTP from 64% to 86.7% draft acceptance. +22 points. Nobody is talking about this.