I'm trying to intercept CUDA driver API calls (like cuInit) by creating a proxy library (libcuda_override.so) that forwards calls to the real libcuda.so. My current approach:
Setup:
libcuda.so.1 symlinks to my proxy (libcuda_override.so).
The proxy dynamically loads the real library (libcuda_real.so.1).
I manually intercept cuInit and forward other calls to the real library.
Problem: When running a Python script with LD_LIBRARY_PATH=./, I get symbol lookup errors for other CUDA functions (e.g., cuGetProcAddress_v2). The dynamic linker expects all CUDA symbols to be defined in my proxy, but I only want to intercept specific functions (like cuInit).
What I Want:
Only resolve intercepted functions (e.g., cuInit) in my proxy.
Automatically forward all other symbols to the real libcuda.so without manual declarations.
Constraints:
Avoid parsing cuda.h or manually declaring every CUDA function.
Prefer a solution that doesn’t require maintaining a full symbol list.
Current Code:
static void* real_libcuda = nullptr;
static std::once_flag libcuda_loaded;
static void ensure_real_libcuda_loaded() {
std::call_once(libcuda_loaded, []() {
const char* msg = "[🔥 LIBCUDA WRAP LOADED]\n";
size_t msg_len = 0;
while (msg[msg_len] != '\0') {
++msg_len;
}
write(STDERR_FILENO, msg, msg_len);
const char* libcuda_path = "libcuda_real.so.1";
real_libcuda = dlopen(libcuda_path, RTLD_LAZY | RTLD_GLOBAL);
if (!real_libcuda) {
fprintf(stderr, "[‼️] Failed to load real libcuda.so.1 from %s: %s\n", libcuda_path, dlerror());
_exit(1);
}
});
}
__attribute__((constructor))
static void libcuda_wrap_ctor() {
ensure_real_libcuda_loaded();
}
template<typename T>
T resolve(const char* name) {
ensure_real_libcuda_loaded();
void* sym = dlsym(real_libcuda, name);
if (!sym) {
fprintf(stderr, "[‼️] dlsym failed for %s: %s\n", name, dlerror());
}
return reinterpret_cast<T>(sym);
}
extern "C" __attribute__((visibility("default")))
CUresult cuInit(unsigned int Flags) {
static auto real = resolve<CUresult(*)(unsigned int)>("cuInit");
std::cout << "[🟢 cuInit intercepted]\n";
return real ? real(Flags) : CUDA_ERROR_UNKNOWN;
}
Error:
./libcuda.so.1: undefined symbol: cuGetProcAddress_v2 (fatal)
Question: Is there a way to explicitly intercept only cuInit (or a subset of functions), and implicitly forward all other symbols to the real libcuda.so without declaring them?
Prefer a solution that doesn’t require maintaining a full symbol list.
You may be holding it wrong.
Instead of creating libcuda.so.1 that is a symlink to your interposer, use LD_PRELOAD=/path/to/libcuda_override.so /path/to/binary.
That way, the binary will bind to your cuInit() in libcuda_override.so, but will bind to cuGetProcAddress_v2 in the (real) libcuda.so.1.
This will not work if the binary does not call cuInit() directly, but instead does h = dlopen("libcuda.so.1", ...) and sym = dlsym(h, "cuInit");.
But I am not sure your application binary has that problem.
If it is in fact a problem, you may be able to use la_symbind* from ld-audit, but that is quite complicated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With