Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Proxy CUDA Functions Without Explicitly Declaring Them All?

I'm trying to intercept CUDA driver API calls (like cuInit) by creating a proxy library (libcuda_override.so) that forwards calls to the real libcuda.so. My current approach:

Setup:

  • libcuda.so.1 symlinks to my proxy (libcuda_override.so).

  • The proxy dynamically loads the real library (libcuda_real.so.1).

  • I manually intercept cuInit and forward other calls to the real library.

Problem: When running a Python script with LD_LIBRARY_PATH=./, I get symbol lookup errors for other CUDA functions (e.g., cuGetProcAddress_v2). The dynamic linker expects all CUDA symbols to be defined in my proxy, but I only want to intercept specific functions (like cuInit).

What I Want:

  • Only resolve intercepted functions (e.g., cuInit) in my proxy.

  • Automatically forward all other symbols to the real libcuda.so without manual declarations.

Constraints:

  • Avoid parsing cuda.h or manually declaring every CUDA function.

  • Prefer a solution that doesn’t require maintaining a full symbol list.

Current Code:


static void* real_libcuda = nullptr;
static std::once_flag libcuda_loaded;

static void ensure_real_libcuda_loaded() {
    std::call_once(libcuda_loaded, []() {
        const char* msg = "[🔥 LIBCUDA WRAP LOADED]\n";

        size_t msg_len = 0;
        while (msg[msg_len] != '\0') {
            ++msg_len;
        }
        write(STDERR_FILENO, msg, msg_len);

        const char* libcuda_path = "libcuda_real.so.1";
        real_libcuda = dlopen(libcuda_path, RTLD_LAZY | RTLD_GLOBAL);
        if (!real_libcuda) {
            fprintf(stderr, "[‼️] Failed to load real libcuda.so.1 from %s: %s\n", libcuda_path, dlerror());
            _exit(1);
        }
    });
}

__attribute__((constructor))
static void libcuda_wrap_ctor() {
    ensure_real_libcuda_loaded();
}

template<typename T>
T resolve(const char* name) {
    ensure_real_libcuda_loaded();
    void* sym = dlsym(real_libcuda, name);
    if (!sym) {
        fprintf(stderr, "[‼️] dlsym failed for %s: %s\n", name, dlerror());
    }
    return reinterpret_cast<T>(sym);
}


extern "C" __attribute__((visibility("default")))
CUresult cuInit(unsigned int Flags) {
    static auto real = resolve<CUresult(*)(unsigned int)>("cuInit");
    std::cout << "[🟢 cuInit intercepted]\n";
    return real ? real(Flags) : CUDA_ERROR_UNKNOWN;
}

Error: ./libcuda.so.1: undefined symbol: cuGetProcAddress_v2 (fatal)

Question: Is there a way to explicitly intercept only cuInit (or a subset of functions), and implicitly forward all other symbols to the real libcuda.so without declaring them?

like image 923
D1_ Avatar asked Oct 26 '25 05:10

D1_


1 Answers

Prefer a solution that doesn’t require maintaining a full symbol list.

You may be holding it wrong.

Instead of creating libcuda.so.1 that is a symlink to your interposer, use LD_PRELOAD=/path/to/libcuda_override.so /path/to/binary.

That way, the binary will bind to your cuInit() in libcuda_override.so, but will bind to cuGetProcAddress_v2 in the (real) libcuda.so.1.


This will not work if the binary does not call cuInit() directly, but instead does h = dlopen("libcuda.so.1", ...) and sym = dlsym(h, "cuInit");.

But I am not sure your application binary has that problem.

If it is in fact a problem, you may be able to use la_symbind* from ld-audit, but that is quite complicated.

like image 126
Employed Russian Avatar answered Oct 27 '25 19:10

Employed Russian



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!