Lambda expressions with CUDA

Question

If I use thrust::transform on thrust::host, the lambda usage is fine

thrust::transform(thrust::host, a, a+arraySize,b,d,[](int a, int b)->int
{
    return a + b;
});

However, if I change thrust::host to thrust::device, the code wouldn't pass the compiler. Here is the error on VS2013:

The closure type for a lambda ("lambda [](int, int)->int") cannot be used in the template argument type of a __global__ function template instantiation, unless the lambda is defined within a __device__ or __global__ function

So, the problem is how using __device__ or __global__ in connection to device lambdas.

Robert Crovella · Accepted Answer

In CUDA 7 it is not possible. Quoting from Mark Harris:

That isn't supported today in CUDA, because the lambda is host code. Passing lambdas from host to device is a challenging problem, but it is something we will investigate for a future CUDA release.

What you can do in CUDA 7 is call thrust algorithms from your device code, and in that case you can pass lambdas to them...

With CUDA 7, thrust algorithms can be called from device code (e.g. CUDA kernels, or __device__ functors). In those situations, you can use (device) lambdas with thrust. An example is given in the parallelforall blog post here.

However, CUDA 7.5 introduces an experimental device lambda feature. This feature is described here:

CUDA 7.5 introduces an experimental feature: GPU lambdas. GPU lambdas are anonymous device function objects that you can define in host code, by annotating them with a __device__ specifier.

In order to enable compilation for this feature, (currently, with CUDA 7.5) it's necessary to specify --expt-extended-lambda on the nvcc compile command line.

Vitality · Answer

This simple code using device lambdas work under CUDA 8.0 RC, although device lambdas for this version of CUDA are still at an experimental stage:

#include <thrust/device_vector.h>
#include <thrust/functional.h>
#include <thrust/transform.h>

using namespace thrust::placeholders;

int main(void)
{
    // --- Input data 
    float a = 2.0f;
    float x[4] = { 1, 2, 3, 4 };
    float y[4] = { 1, 1, 1, 1 };

    thrust::device_vector<float> X(x, x + 4);
    thrust::device_vector<float> Y(y, y + 4);

    thrust::transform(X.begin(), 
                      X.end(),  
                      Y.begin(), 
                      Y.begin(),
                      [=] __host__ __device__ (float x, float y) { return a * x + y; }      // --- Lambda expression 
                     );        

    for (size_t i = 0; i < 4; i++) std::cout << a << " * " << x[i] << " + " << y[i] << " = " << Y[i] << std::endl;

    return 0;
}

Remember to use

--expt-extended-lambda

for compilation.

Lambda expressions with CUDA

Tags:

c++

c++11

lambda

cuda

spiritsaway

2 Answers

Robert Crovella

Vitality

Recent Activity

Donate For Us

Lambda expressions with CUDA

Tags:

c++

c++11

lambda

cuda

spiritsaway

2 Answers

Robert Crovella

Vitality

Related questions

Recent Activity

Donate For Us