hi I just wanted to know whether it is possible to do the following inside the nvidia cuda kernel
__global__ void compute(long *c1, long size, ...)
{
...
long d[1000];
...
}
or the following
__global__ void compute(long *c1, long size, ...)
{
...
long d[size];
...
}
From the same document: > CUDA arrays are opaque memory layouts optimized for texture fetching.
In order to run a kernel on the CUDA threads, we need two things. First, in the main() function of the program, we call the function to be executed by each thread on the GPU. This invocation is called Kernel Launch and with it we need provide the number of threads and their grouping.
Using the CUDA Toolkit you can accelerate your C or C++ applications by updating the computationally intensive portions of your code to run on GPUs. To accelerate your applications, you can call functions from drop-in libraries as well as develop custom applications using languages including C, C++, Fortran and Python.
You can do #1, but beware this will be done in EVERY thread!
Your second snippet won't work, because dynamic memory allocation at kernel runtime is not supported.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With