Episode 1 — we discovered Lucebox and decided to package it for Olares.
Episode 2 — first build, 2h of compile for 11 undefined references to cuMemCreate, cuMemMap, etc.
Fix applied: LIBRARY_PATH pointing at /usr/local/cuda/lib64/stubs + symlink libcuda.so → libcuda.so.1. Logical. I rerun.
2h later
Recompile. Re-link. And then…
/usr/bin/ld: warning: libcuda.so.1, needed by libggml-cuda.so.0.9.11, not found
/usr/bin/ld: libggml-cuda.so.0.9.11: undefined reference to `cuMemCreate'
... 11 identical undefined references
The exact same error. Letter for letter. As if I had done nothing.
Here’s debug rule #1: if you think you fixed the problem but it comes back identical, you didn’t fix the actual problem. Time to read the manual.
What LIBRARY_PATH actually does
LIBRARY_PATH is an environment variable that gcc/clang use to resolve libraries directly referenced by the link command. If you do gcc main.c -lfoo and libfoo.so lives in a directory listed in LIBRARY_PATH, ld will find it. OK.
But if you do gcc main.c -lbar, and libbar.so itself depends on another lib libfoo.so, then LIBRARY_PATH doesn’t help. ld will look for libfoo.so in its standard system search path (/lib, /usr/lib, /usr/lib/x86_64-linux-gnu, etc.) and nowhere else.
That’s an indirect dependency. And it’s exactly our case: we link test_dflash which depends on libggml-cuda.so which depends on libcuda.so.1. ld will find libggml-cuda.so (direct) but not libcuda.so.1 (indirect) — because it doesn’t look in LIBRARY_PATH for indirects.
The ld warning literally said so:
not found (try using -rpath or -rpath-link)
I had read it but not really understood it. The ld docs confirm:
-rpath-link=DIR: When using ELF or SunOS, one shared library may require another. […] If-rpath-linkis specified, the linker will use that for indirect resolution.
Bingo.
The real fix
Pass the stubs path to the linker via CMAKE_EXE_LINKER_FLAGS and CMAKE_SHARED_LINKER_FLAGS, not via LIBRARY_PATH:
RUN cmake -B build -S . \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_CUDA=ON \
-DCMAKE_CUDA_ARCHITECTURES="120" \
-DCMAKE_EXE_LINKER_FLAGS="-Wl,-rpath-link,/usr/local/cuda/lib64/stubs" \
-DCMAKE_SHARED_LINKER_FLAGS="-Wl,-rpath-link,/usr/local/cuda/lib64/stubs" \
&& cmake --build build --target test_dflash -j $(nproc)
Note that I also went from "86;89;120" to just "120". Why? Because I’m not distributing this image to the world — it’ll only run on my Olares One sm_120. Cuts compile time by ~3× without losing anything. If we ever need a wider target, we add it back.
56 minutes later
[ 98%] Built target dflash27b
[100%] Building CXX object CMakeFiles/test_dflash.dir/test/test_dflash.cpp.o
[100%] Linking CXX executable test_dflash
[100%] Built target test_dflash
#13 DONE 3337.7s
Yes! Built target test_dflash. The DFlash CLI binary is compiled. 56 min this time (CUDA cache partially reused from the previous build + a single arch). Not bad.
Except test_dflash is just the Lucebox bench CLI. To do real OpenAI-compatible HTTP serving, I need llama-server, which compiles from the deps/llama.cpp submodule of the Lucebox fork. Build #2.
And surprise — it’s not the same cmake invocation, it’s in a sub-project, and I haven’t carried over the -rpath-link. So when I kick off the llama-server build…
(You see where this is going.)
Episode 4: 1h later, ld dumps the exact same 11 undefined references. See you next time!
Disclosure — All the benchmarks in this post run on my own Olares One. If the content was useful and you’re considering one, ordering through this referral link gets you $400 off ($3,599 instead of $3,999) and pays me $200. I’m mentioning this out of transparency — and yes, incidentally, it helps keep the blog alive (hosting, domain, and the time I spend writing here). Link valid until late June 2026.