Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA debugging, or how to get source code lines in cuda-gdb without disabling optimization?

I have a rather large and complex CUDA code that hangs quite reliably for large numbers of blocks/threads. I am trying to figure out exactly where the code hangs.

When I run the code in cuda-gdb, I can see which threads/blocks are hanging, but I can't see where, beyond the "virtual PC".

If I compile the code with "-G" to get the debug information, it runs a lot slower and refuses to hang, no matter how long I run it for.

Is there any way to map a "virtual PC" to a line of code in the source code, even approximately? Or is there a way to get the debugging information in without turning off all optimization?

I've tried using "-G3", yet to no avail. This just gives me warnings of the type "nvcc warning : Setting optimization level to 0 as optimized debugging is not supported". I am using CUDA compilation tools release 4.1.

like image 856
Pedro Avatar asked Jan 16 '23 16:01

Pedro


1 Answers

Ok, I think I've figured it out on my own.

If cuobjdump is in the path, then in cuda-gdb, the command x $pc will give you the assembler at which the current thread is stopped. The problem is that if the source was not compiled with -G, you won't be able to relate the assembler statement to a line in your code.

To match the assembler to the kernel code, first make sure that you compiled your kernel with nvcc -keep [..] mykernel.cu. This should generate the files mykernel.sm_20.cubin (or whatever arch you chose) and mykernel.ptx.

To get the assembler of your entire kernel, run cuobjdump -sass mykernel.cubin > output.ptx. In cuda-gdb, do x/20i $pc-80 to get a bit of context, and look for those lines in the file output.ptx. You can then try to match those lines to the PTX code in mykernel.ptx which contains .loc statements which refer to the line in source.

This approach requires a bit of creativity in matching the PTX from the cubin-file and the PTX from nvcc, as the instructions may be re-ordered somewhat. In my code, I had large blocks of FFMA instructions I could look for to get my bearings. You can use the "output.ptx" to find the exact line from the debugger and then look in "mykernel.ptx" at the same relative position.

This all involves quite a bit of work, but it does allow you to narrow-down the location of the "Virtual PC" in your original source.

like image 189
Pedro Avatar answered Feb 06 '23 09:02

Pedro