Tag · llama-cpp
# llama-cpp
All posts tagged "llama-cpp".
-
Qwen3.6-27B MTP via llama.cpp PR #22673 on consumer Blackwell — 78 t/s with no fork, no patch
MTP finally lands in llama.cpp upstream (PR #22673 by am17an, May 4). Bench on Olares One RTX 5090M sm_120: 78 t/s with an MTP-enabled GGUF, +123% vs baseline. No Lucebox, no Genesis, no permanent custom fork.
Lire → -
DFlash unblocked on 24GB consumer Blackwell — 80 t/s, 3 days after the "impossible" post
Three days ago I wrote that the stock DFlash path didn't fit 24GB consumer. Spoiler: it works now via buun-llama-cpp + a Q8_0 GGUF spiritbuun drafter. 80 t/s avg on Olares One sm_120 mobile Blackwell.
Lire →