Tag · llama-cpp

# llama-cpp

All posts tagged "llama-cpp".

Qwen3.6-27B MTP via llama.cpp PR #22673 on consumer Blackwell — 78 t/s with no fork, no patch

05.05.2026

MTP finally lands in llama.cpp upstream (PR #22673 by am17an, May 4). Bench on Olares One RTX 5090M sm_120: 78 t/s with an MTP-enabled GGUF, +123% vs baseline. No Lucebox, no Genesis, no permanent custom fork.
Lire →
DFlash unblocked on 24GB consumer Blackwell — 80 t/s, 3 days after the "impossible" post

04.05.2026

Three days ago I wrote that the stock DFlash path didn't fit 24GB consumer. Spoiler: it works now via buun-llama-cpp + a Q8_0 GGUF spiritbuun drafter. 80 t/s avg on Olares One sm_120 mobile Blackwell.
Lire →