Archive
All the posts.
Everything I've tested, tuned, benched or discovered. Latest to oldest.
-
Lucebox on Olares One — Episode 7: Issue #187, PR #188, and 6 hooks fixed in one go
The bug is identified: 6 hooks in HAMi-core ignore the return value of cuCtxGetDevice. The fix is 50 lines. But for the entire HAMi community to benefit, it has to go upstream. Here's how that played out.
Lire → -
Lucebox on Olares One — Episode 6: We read the HAMi-core source and we find 6 bugs
NO_VMM doesn't fix anything. The `Illegal device id` bug comes back every run. Time to read the HAMi-core source. And what we find is not a single bug — it's a systemic pattern across 6 different hooks.
Lire → -
Lucebox on Olares One — Episode 5: The runtime slams the door with a negative device id
Image pushed, pod deployed, models downloaded. Everything is ready. Then HAMi vGPU dumps `Illegal device id: -644371744` on every boot, with a random number that changes each run. Smells like uninitialized stack from a mile away.
Lire → -
Lucebox on Olares One — Episode 4: The llama-server submodule serves it up to you 1h later
test_dflash compiles, great. But to serve over HTTP I need llama-server, which compiles from the submodule. And the submodule has its own cmake invocation — where I forgot to add -rpath-link. And boom, 1h later, here we go again.
Lire → -
Lucebox on Olares One — Episode 3: LIBRARY_PATH isn't what you think it is
We added LIBRARY_PATH and a libcuda.so.1 symlink, fired off another 2h compile, and ld dropped the same error. Why? Because LIBRARY_PATH doesn't resolve indirect dependencies. You need -Wl,-rpath-link.
Lire → -
Lucebox on Olares One — Episode 2: 2h of CUDA compile for 11 undefined references
First Docker build. 2h13 of CUDA compile for sm_120, and at link time, ld dumps 11 undefined references to cuMemCreate, cuMemMap, cuMemAddressReserve. Why? Because libcuda.so.1 isn't where it should be.
Lire →