Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I retrieve the parameter list information for a CUDA 4.0+ kernel?

Tags:

cuda

According to the NVidia documentation for the cuLaunchKernel function, kernels compiled with CUDA 3.2+ contain information regarding their parameter list. Is there a way to retrieve this information programmatically from a CUfunction handle? I need to know the number of arguments and the size of each argument in bytes of a kernel from its CUfunction handle. I have seen the above-referenced NVidia documentation saying that this information exists, but I haven't seen anywhere in the CUDA documentation indicating a programmatic way to access this information.

To add a little more explanation: I'm working with a middleware system. Its frontside library replaces libcuda (the driver API library) on the target system. The backside then runs as a daemon on another host that has the GPGPU resource being used and calls into the real libcuda on that machine. There are other middleware solutions that already do this with cuLaunchKernel, so it's definitely possible. Also, CUDA itself uses this information in order to know how to parse the parameters from the pointer that you pass into cuLaunchKernel.

Edit: I originally had the CUDA version where this metadata was introduced listed incorrectly. It was 3.2, not 4.0, according to the cuLaunchKernel documentation.

like image 914
reirab Avatar asked Nov 03 '22 09:11

reirab


2 Answers

cuLaunchKernel is designed to launch kernels for which you know the function prototype. There is no API for "reverse engineering" the function prototype.

like image 162
harrism Avatar answered Nov 09 '22 10:11

harrism


I'm working on the same issue (I don't know if in between you solved it). I'm using a known kernel to investigate how che CUfunction pointed memory is used. This is the no parameters version:

#include<cstdio>

extern "C" {
    __global__ void HelloWorld(){
        int thid = (blockIdx.x * blockDim.x) + threadIdx.x;
    }
}

This is the one parameter version and so on.

#include<cstdio>

extern "C" {
    __global__ void HelloWorld(int a) {
        int thid = (blockIdx.x * blockDim.x) + threadIdx.x;
    }
}

I suggest you to dump the first 1024 bytes of the memory pointed by CUfunction and follow the pointers. For example at the 0x30 offset there is a pointer pointing to a table of pointers. I noticed that the size of the struct posted by CUfunction doesn't change with the number of the function parameters, so the table we are looking have to be hunted following the pointers.

like image 37
Raffaele Montella Avatar answered Nov 09 '22 08:11

Raffaele Montella