Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA kernel doesn't launch

Tags:

cuda

gpgpu

gpu

My problem is very much like this one. I run the simplest CUDA program but the kernel doesn't launch. However, I am sure that my CUDA installation is ok, since I can run complicated CUDA projects consisting of several files (which I took from someone else) with no problems. In these projects, compilation and linking is done through makefiles with a lot of flags. I think the problem is in the correct flags to use while compiling. I simply use a command like this: nvcc -arch=sm_20 -lcudart test.cu with a such a program (to run on a linux machine):

 __global__ void myKernel() 
{ 

    cuPrintf("Hello, world from the device!\n"); 


} 
int main() 
{ 
    cudaPrintfInit(); 
    myKernel<<<1,10>>>(); 
    cudaPrintfDisplay(stdout, true);    
    cudaPrintfEnd(); 
} 

The program compiles correctly. When I add cudaMemcpy() operations, it returns no error. Any suggestion on why the kernel doesn't launch ?

like image 611
Tarek Avatar asked Aug 28 '12 17:08

Tarek


People also ask

How do I start CUDA kernel?

In order to launch a CUDA kernel we need to specify the block dimension and the grid dimension from the host code. I'll consider the same Hello World! code considered in the previous article. In the above code, to launch the CUDA kernel two 1's are initialised between the angle brackets.

Is CUDA kernel blocking?

In CUDA, kernel launches are asynchronous (often called “non-blocking”). An example of kernel execution from host perspective: 1. Host call starts the kernel execution.

Can a CUDA kernel launch another kernel?

CUDA Dynamic Parallelism Under dynamic parallelism, one kernel may launch another kernel, and that kernel may launch another, and so on. Each subordinate launch is considered a new “nesting level,” and the total number of levels is the “nesting depth” of the program.


2 Answers

The reason it is not printing when using printf is that kernel launches are asynchronous and your program is exiting before the printf buffer gets flushed. Section B.16 of the CUDA (5.0) C Programming Guide explains this.

The output buffer for printf() is set to a fixed size before kernel launch (see Associated Host-Side API). It is circular and if more output is produced during kernel execution than can fit in the buffer, older output is overwritten. It is flushed only when one of these actions is performed:

  • Kernel launch via <<<>>> or cuLaunchKernel() (at the start of the launch, and if the CUDA_LAUNCH_BLOCKING environment variable is set to 1, at the end of the launch as well),
  • Synchronization via cudaDeviceSynchronize(), cuCtxSynchronize(), cudaStreamSynchronize(), cuStreamSynchronize(), cudaEventSynchronize(), or cuEventSynchronize(),
  • Memory copies via any blocking version of cudaMemcpy*() or cuMemcpy*(),
  • Module loading/unloading via cuModuleLoad() or cuModuleUnload(),
  • Context destruction via cudaDeviceReset() or cuCtxDestroy().

For this reason, this program prints nothing:

#include <stdio.h>

__global__ void myKernel() 
{ 
  printf("Hello, world from the device!\n"); 
} 

int main() 
{ 
  myKernel<<<1,10>>>(); 
} 

But this program prints "Hello, world from the device!\n" ten times.

#include <stdio.h>

__global__ void myKernel() 
{ 
  printf("Hello, world from the device!\n"); 
} 

int main() 
{ 
  myKernel<<<1,10>>>(); 
  cudaDeviceSynchronize();
} 
like image 53
harrism Avatar answered Sep 21 '22 00:09

harrism


Are you sure that your CUDA device supports the SM_20 architecture?

Remove the arch= option from your nvcc command line and rebuild everything. This compiles for the 1.0 CUDA architecture, which will be supported on all CUDA devices. If it still doesn't run, do a build clean and make sure there are no object files left anywhere. Then rebuild and run.

Also, arch= refers to the virtual architecture, which should be something like compute_10. sm_20 is the real architecture and I believe should be used with the code= switch, not arch=.

like image 41
dthorpe Avatar answered Sep 21 '22 00:09

dthorpe