How to get the assembly code of a CUDA kernel?

Tags:

I have some kernels that I have written in both OpenCL and CUDA. When running OpenCL programs in the AMD profiler, it allows me to view the assembly code of the kernel. I would like to compare this with the assembly code of the CUDA kernels to compare compiler optimizations between the two languages. I have been playing around with the Nvidia Profiler, but am still at a loss on how to get the assembly code of my kernels. How does one go about doing this?

733

asked Dec 09 '13 23:12

PseudoPsyche

1 Answers

As mentioned by turboscrew, the closest thing to assembly for CUDA is the PTX code. I thought it would be more useful to add to this answer the method of actually generating the PTX code.

This can be generated in the following way:

nvcc -ptx -o kernel.ptx kernel.cu

Where kernel.cu is your source file and kernel.ptx is the destination PTX file.

Also, here is a link to NVidia's PTX documentation:

http://docs.nvidia.com/cuda/parallel-thread-execution/index.html

If you have some assembly knowledge, most of it is fairly straightforward. There are some special functions that may be used where it would be useful to look them up for more details though.

119

answered Nov 07 '22 12:11

PseudoPsyche

Related questions
                            
                                Avoid duplicating code
                            
                                How can malloc() cause a SIGSEGV?
                            
                                Will .NET take over C/C++ any time? [closed]
                            
                                Using XOR operator for finding duplicate elements in a array fails in many cases
                            
                                What was the most dangerous programming mistake you have made in C?
                            
                                clang c11 threads.h not found
                            
                                Cross-compiling from Linux to Windows with Clang
                            
                                How to build and deploy a Linux driver?
                            
                                Unbuffered subprocess stdout on windows
                            
                                What's the purpose of the casts to signed int in glibc memmove?
                            
                                Where is documentation on the embedding API for the Ruby interpreter? [closed]
                            
                                How to compare multibyte characters in C
                            
                                Examples of code that compiles but executes differently in C versus C++ [closed]
                            
                                Does POSIX guarantee signals will not be delivered to a partially-initialized thread?
                            
                                ctags multi-line C function prototypes
                            
                                How to get system cpu/ram usage in c++ on Windows [duplicate]
                            
                                Looking for an explanation for thread synchronization performance issue
                            
                                Is there any difference between "&&" and "and"? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get the assembly code of a CUDA kernel?

Tags:

c

assembly

cuda

gpgpu

nvidia

PseudoPsyche

People also ask

1 Answers

PseudoPsyche

Recent Activity

Donate For Us