Tag · cuda
# cuda
All posts tagged "cuda".
-
Gemma 4 12B hits 170 t/s — upstream merge buys +67% speed for free
Two days ago I shipped Gemma 4 12B QAT at 102 t/s on Olares One. Today I ship 170 t/s. Same hardware. Same model file. Same drafter. Same context. The delta: am17an's PR #23398 (Gemma 4 MTP support) merged into llama.cpp upstream at 12:50 UTC. My custom image — a snapshot of the WIP branch at commit dd97604 — was missing 10+ polish commits that ggerganov forced in review. +67% speed on the exact same setup, just by rebasing. Bonus: critical insight on Olares One's nvidia driver capping CUDA at 13.1, blocking the whole upstream Docker ecosystem.
Lire → -
Lucebox on Olares One — Episode 6: We read the HAMi-core source and we find 6 bugs
NO_VMM doesn't fix anything. The `Illegal device id` bug comes back every run. Time to read the HAMi-core source. And what we find is not a single bug — it's a systemic pattern across 6 different hooks.
Lire → -
Lucebox on Olares One — Episode 5: The runtime slams the door with a negative device id
Image pushed, pod deployed, models downloaded. Everything is ready. Then HAMi vGPU dumps `Illegal device id: -644371744` on every boot, with a random number that changes each run. Smells like uninitialized stack from a mile away.
Lire → -
Lucebox on Olares One — Episode 4: The llama-server submodule serves it up to you 1h later
test_dflash compiles, great. But to serve over HTTP I need llama-server, which compiles from the submodule. And the submodule has its own cmake invocation — where I forgot to add -rpath-link. And boom, 1h later, here we go again.
Lire → -
Lucebox on Olares One — Episode 3: LIBRARY_PATH isn't what you think it is
We added LIBRARY_PATH and a libcuda.so.1 symlink, fired off another 2h compile, and ld dropped the same error. Why? Because LIBRARY_PATH doesn't resolve indirect dependencies. You need -Wl,-rpath-link.
Lire → -
Lucebox on Olares One — Episode 2: 2h of CUDA compile for 11 undefined references
First Docker build. 2h13 of CUDA compile for sm_120, and at link time, ld dumps 11 undefined references to cuMemCreate, cuMemMap, cuMemAddressReserve. Why? Because libcuda.so.1 isn't where it should be.
Lire →