I have some kernels that I have written in both OpenCL and CUDA. When running OpenCL programs in the AMD profiler, it allows me to view the assembly code of the kernel. I would like to compare this with the assembly code of the CUDA kernels to compare compiler optimizations between the two languages. I have been playing around with the Nvidia Profiler, but am still at a loss on how to get the assembly code of my kernels. How does one go about doing this?
In order to compile CUDA code files, you have to use nvcc compiler. Cuda codes can only be compiled and executed on node that have a GPU. Heracles has 4 Nvidia Tesla P100 GPUs on node18. Cuda Compiler is installed on node 18, so you need ssh to compile cuda programs.
By following the ABI, external developers can generate compliant PTX code that can be linked with other code. PTX is a low-level parallel-thread-execution virtual machine and ISA (Instruction Set Architecture). PTX can be output from multiple tools or written directly by developers.
The closest that you can easily get to assembly on NVIDIA GPUs is PTX, which is a virtual assembly language that is compiled by the CUDA driver to the machine code of your GPU before execution.
As mentioned by turboscrew, the closest thing to assembly for CUDA is the PTX code. I thought it would be more useful to add to this answer the method of actually generating the PTX code.
This can be generated in the following way:
nvcc -ptx -o kernel.ptx kernel.cu
Where kernel.cu
is your source file and kernel.ptx
is the destination PTX file.
Also, here is a link to NVidia's PTX documentation:
http://docs.nvidia.com/cuda/parallel-thread-execution/index.html
If you have some assembly knowledge, most of it is fairly straightforward. There are some special functions that may be used where it would be useful to look them up for more details though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With