Archive
All the posts.
Everything I've tested, tuned, benched or discovered. Latest to oldest.
-
Qwen3.6-27B on upstream llama.cpp: +123% free with MTP, zero fork to maintain
MTP finally lands in llama.cpp upstream (PR #22673 by am17an, May 4). Bench on Olares One RTX 5090M sm_120: 78 t/s with an MTP-enabled GGUF, +123% vs baseline. No Lucebox, no Genesis, no permanent custom fork.
Lire → -
Lucebox on Olares One — Episode 9: the PR that promised +57% and delivered +0.2%
Last night Lucebox crossed 88.5 t/s on Olares One and became the new champion. This morning PR #94 reports +57% on RTX 4090. If it scales, we hit 120 t/s. Spoiler: 88.7 t/s. Full DDTree sweep, three hypotheses, the honest lesson on upstream benches that don't reproduce.
Lire → -
Lucebox on Olares One — Episode 8: seven days of waiting, one lib swapped by hand, 88.5 t/s
Seven days after my PR #188 to HAMi-core, still no review. The saga had its cliffhanger — I was waiting on someone else. Then a stupid idea: compile my patched lib and swap it myself. Three new bugs, one night, and at the end Lucebox hits 88.5 t/s. First llama.cpp-based path to pass vLLM Turbo on this hardware.
Lire → -
My personal Olares Market — 28 apps hand-tuned for the Olares One, one click away
A custom Olares Market hand-tuned for the RTX 5090M of the Olares One. 28 ready-to-install apps: llama.cpp, vLLM, DFlash, Voxtral ASR/TTS, vision, music. How to add it to your device in 30 seconds.
Lire → -
DFlash unblocked on 24GB consumer Blackwell — 80 t/s, 4 days after the "impossible" post
Four days ago I wrote that DFlash on 24GB consumer Blackwell didn't fit. On April 28, a dev publishes a quantized drafter. On April 30, I build, I test, I get 0.97 t/s. On May 1, after my issue, the dev fixes it in 24h. Tonight: 80 t/s. The story of a thesis that lasted 72 hours.
Lire → -
Lucebox on Olares One — Episode 7: six HAMi hooks fixed upstream in one go
The bug is identified: 6 hooks in HAMi-core ignore the return value of cuCtxGetDevice. The fix is 50 lines. But for the entire HAMi community to benefit, it has to go upstream. Here's how that played out.
Lire →