Tag · blackwell

# blackwell

All posts tagged "blackwell".

Lucebox on Olares One — Episode 2: 2h of CUDA compile for 11 undefined references

28.04.2026

First Docker build. 2h13 of CUDA compile for sm_120, and at link time, ld dumps 11 undefined references to cuMemCreate, cuMemMap, cuMemAddressReserve. Why? Because libcuda.so.1 isn't where it should be.
Lire →
Lucebox on Olares One — Episode 1: 134 t/s on RTX 3090, what about my rig?

28.04.2026

You're scrolling r/LocalLLaMA, you see a post claiming 134 t/s on Qwen3.6-27B with an RTX 3090 thanks to Lucebox. Of course you want to try it on your Olares One. Spoiler: it'll take 12 hours of compile time and 6 Docker builds. Episode 1.
Lire →
Why DFlash on Qwen3.6-27B doesn't fit on a 24GB single GPU

28.04.2026

Three paths tested (z-lab BF16, AEON-7 NVFP4, Lucebox custom). All need ≥26 GB. VRAM math, honest negatives, what to wait for on 24GB.
Lire →
Genesis on consumer Blackwell — TurboQuant unlocked for Qwen3.6-27B on 24GB

28.04.2026

Sandermage Genesis patches validated on RTX 5090M (sm_120). TurboQuant 4-bit + MTP n=3 on Qwen3.6-27B → 60 t/s, 100K context, 177K KV tokens.
Lire →
Qwen3.6-27B at 85-100 t/s on a 24GB RTX 5090 Laptop GPU

26.04.2026

Adapting the 32GB desktop and 24GB Ampere recipes to a 24GB Blackwell consumer mobile (sm_120) GPU. Custom vLLM image, AutoRound INT4, MTP n=3 — sustained 85-100 t/s with 75K context.
Lire →