Tag · nvidia

# nvidia

All posts tagged "nvidia".

MTP merged in llama.cpp master — and the n_max default change everyone missed (86.7% accept on Qwen3.6 27B Blackwell mobile)

21.05.2026

MTP support was merged into llama.cpp master on May 16th. Five days later, three follow-up PRs quietly changed how MTP behaves — including the spec-draft-n-max default flipping from 16 to 3. On Olares One (RTX 5090M sm_120), that change plus NVIDIA's backend-sampling rewrite (#23287) pushed Qwen3.6 27B MTP from 64% to 86.7% draft acceptance. +22 points. Nobody is talking about this.
Lire →