Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA function pointers

I was trying to make somtehing like this (actually I need to write some integration functions) in CUDA

#include <iostream>
using namespace std;

float f1(float x) {
    return x * x;
}

float f2(float x) {
    return x;
}

void tabulate(float p_f(float)) {
    for (int i = 0; i != 10; ++i) {
        std::cout << p_f(i) << ' ';
    }
    std::cout << std::endl;
}

int main() {
    tabulate(f1);
    tabulate(f2);
    return 0;
}

output:

0 1 4 9 16 25 36 49 64 81
0 1 2 3 4 5 6 7 8 9


I tried the following but only got the error

Error: Function pointers and function template parameters are not supported in sm_1x.

float f1(float x) {
    return x;
}

__global__ void tabulate(float lower, float upper, float p_function(float), float* result) {
    for (lower; lower < upper; lower++) {
        *result = *result + p_function(lower);
    }
}

int main() {
    float res;
    float* dev_res;

    cudaMalloc( (void**)&dev_res, sizeof(float) ) ;

    tabulate<<<1,1>>>(0.0, 5.0, f1, dev_res);
    cudaMemcpy(&res, dev_res, sizeof(float), cudaMemcpyDeviceToHost);

    printf("%f\n", res);
    /************************************************************************/
    scanf("%s");

    return 0;
}
like image 323
DanilGholtsman Avatar asked Mar 26 '13 18:03

DanilGholtsman


2 Answers

To get rid of your compile error, you'll have to use -gencode arch=compute_20,code=sm_20 as a compiler argument when compiling your code. But then you'll likely have some runtime problems:

Taken from the CUDA Programming Guide http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#functions

Function pointers to __global__ functions are supported in host code, but not in device code. Function pointers to __device__ functions are only supported in device code compiled for devices of compute capability 2.x and higher.

It is not allowed to take the address of a __device__ function in host code.

so you can have something like this (adapted from the "FunctionPointers" sample):

//your function pointer type - returns unsigned char, takes parameters of type unsigned char and float
typedef unsigned char(*pointFunction_t)(unsigned char, float);

//some device function to be pointed to
__device__ unsigned char
Threshold(unsigned char in, float thresh)
{
   ...
}

//pComputeThreshold is a device-side function pointer to your __device__ function
__device__ pointFunction_t pComputeThreshold = Threshold;
//the host-side function pointer to your __device__ function
pointFunction_t h_pointFunction;

//in host code: copy the function pointers to their host equivalent
cudaMemcpyFromSymbol(&h_pointFunction, pComputeThreshold, sizeof(pointFunction_t))

You can then pass the h_pointFunction as a parameter to your kernel, which can use it to call your __device__ function.

//your kernel taking your __device__ function pointer as a parameter
__global__ void kernel(pointFunction_t pPointOperation)
{
    unsigned char tmp;
    ...
    tmp = (*pPointOperation)(tmp, 150.0)
    ...
}

//invoke the kernel in host code, passing in your host-side __device__ function pointer
kernel<<<...>>>(h_pointFunction);

Hopefully that made some sense. In all, it looks like you would have to change your f1 function to be a __device__ function and follow a similar procedure (the typedefs aren't necessary, but they do make the code nicer) to get it as a valid function pointer on the host-side to pass to your kernel. I'd also advise giving the FunctionPointers CUDA sample a look over

like image 189
alrikai Avatar answered Sep 28 '22 06:09

alrikai


Even though you may be able to compile this code (see @Robert Crovella's answer) this code will not work. You cannot pass function pointers from host code as the host compiler has no way of figuring out the function address.

like image 39
Eugene Avatar answered Sep 28 '22 06:09

Eugene