If I use thrust::transform
on thrust::host
, the lambda usage is fine
thrust::transform(thrust::host, a, a+arraySize,b,d,[](int a, int b)->int
{
return a + b;
});
However, if I change thrust::host
to thrust::device
, the code wouldn't pass the compiler. Here is the error on VS2013:
The closure type for a lambda ("lambda [](int, int)->int") cannot be used in the template argument type of a
__global__
function template instantiation, unless the lambda is defined within a__device__
or__global__
function
So, the problem is how using __device__
or __global__
in connection to device lambdas.
In CUDA 7 it is not possible. Quoting from Mark Harris:
That isn't supported today in CUDA, because the lambda is host code. Passing lambdas from host to device is a challenging problem, but it is something we will investigate for a future CUDA release.
What you can do in CUDA 7 is call thrust algorithms from your device code, and in that case you can pass lambdas to them...
With CUDA 7, thrust algorithms can be called from device code (e.g. CUDA kernels, or __device__
functors). In those situations, you can use (device) lambdas with thrust. An example is given in the parallelforall blog post here.
However, CUDA 7.5 introduces an experimental device lambda feature. This feature is described here:
CUDA 7.5 introduces an experimental feature: GPU lambdas. GPU lambdas are anonymous device function objects that you can define in host code, by annotating them with a
__device__
specifier.
In order to enable compilation for this feature, (currently, with CUDA 7.5) it's necessary to specify --expt-extended-lambda
on the nvcc
compile command line.
This simple code using device lambdas work under CUDA 8.0 RC, although device lambdas for this version of CUDA are still at an experimental stage:
#include <thrust/device_vector.h>
#include <thrust/functional.h>
#include <thrust/transform.h>
using namespace thrust::placeholders;
int main(void)
{
// --- Input data
float a = 2.0f;
float x[4] = { 1, 2, 3, 4 };
float y[4] = { 1, 1, 1, 1 };
thrust::device_vector<float> X(x, x + 4);
thrust::device_vector<float> Y(y, y + 4);
thrust::transform(X.begin(),
X.end(),
Y.begin(),
Y.begin(),
[=] __host__ __device__ (float x, float y) { return a * x + y; } // --- Lambda expression
);
for (size_t i = 0; i < 4; i++) std::cout << a << " * " << x[i] << " + " << y[i] << " = " << Y[i] << std::endl;
return 0;
}
Remember to use
--expt-extended-lambda
for compilation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With