In the following code, if I bring the #define N 65536 above the #if FSIZE, then I get the following error:
#if FSIZE==1
__global__ void compute_sum1(float *a, float *b, float *c, int N)
{
#define N 65536
int majorIdx = blockIdx.x;
int subIdx = threadIdx.x;
int idx=majorIdx*32+subIdx ;
float sum=0;
int t=4*idx;
if(t<N)
{
c[t]= a[t]+b[t];
c[t+1]= a[t+1]+b[t+1];
c[t+2]= a[t+2]+b[t+2];
c[t+3]= a[t+3]+b[t+3];
}
return;
}
#elif FSIZE==2
__global__ void compute_sum2(float2 *a, float2 *b, float2 *c, int N)
#define N 65536
{
int majorIdx = blockIdx.x;
int subIdx = threadIdx.x;
int idx=majorIdx*32+subIdx ;
float sum=0;
int t=2*idx;
if(t<N)
{
c[t].x= a[t].x+b[t].x;
c[t].y= a[t].y+b[t].y;
c[t+1].x= a[t+1].x+b[t+1].x;
c[t+1].y= a[t+1].y+b[t+1].y;
}
return ;
}
float1vsfloat2.cu(10): error: expected a ")"
This problem is a little annoying and I would really really like to know why its happening. I have a feeling I'm overlooking something really silly. Btw, this code section is at the top of the file. Not even an #include before it. I will really appreciate any possible explanations.
CUDA is a programming language that uses the Graphical Processing Unit (GPU). It is a parallel computing platform and an API (Application Programming Interface) model, Compute Unified Device Architecture was developed by Nvidia. This allows computations to be performed in parallel while providing well-formed speed.
CUDA C++ is just one of the ways you can create massively parallel applications with CUDA. It lets you use the powerful C++ programming language to develop high performance algorithms accelerated by thousands of parallel threads running on GPUs.
In order to compile CUDA code files, you have to use nvcc compiler. Cuda codes can only be compiled and executed on node that have a GPU. Heracles has 4 Nvidia Tesla P100 GPUs on node18. Cuda Compiler is installed on node 18, so you need ssh to compile cuda programs.
NVIDIA's CUDA CompilerEach CUDA program is a combination of host code written in C/C++ standard semantics with some extensions within CUDA API as well as the GPU device kernel functions.
The preprocessor changes this line:
__global__ void compute_sum1(float *a, float *b, float *c, int N)
to
__global__ void compute_sum1(float *a, float *b, float *c, int 65536)
which isn't valid CUDA code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With