Tag · mtp

# mtp

All posts tagged "mtp".

Qwen3.6-27B on upstream llama.cpp: +123% free with MTP, zero fork to maintain

05.05.2026

MTP finally lands in llama.cpp upstream (PR #22673 by am17an, May 4). Bench on Olares One RTX 5090M sm_120: 78 t/s with an MTP-enabled GGUF, +123% vs baseline. No Lucebox, no Genesis, no permanent custom fork.
Lire →
Genesis on consumer Blackwell — TurboQuant unlocked for Qwen3.6-27B on 24GB

28.04.2026

Sandermage Genesis patches validated on RTX 5090M (sm_120). TurboQuant 4-bit + MTP n=3 on Qwen3.6-27B → 60 t/s, 100K context, 177K KV tokens.
Lire →
Qwen3.6-27B at 85-100 t/s on a 24GB RTX 5090 Laptop GPU

26.04.2026

Adapting the 32GB desktop and 24GB Ampere recipes to a 24GB Blackwell consumer mobile (sm_120) GPU. Custom vLLM image, AutoRound INT4, MTP n=3 — sustained 85-100 t/s with 75K context.
Lire →