I'm not sure if it's possible. I want to study OpenCL in-depth, so I was wondering if there is a tool to disassemble an compiled OpenCL kernel. For normal x86 executable, I can use objdump to get a disassembly view. Is there a similar tool for OpenCL kernel, yet?

If you're using NVIDIA's OpenCL implementation for their GPUs, you can do the followings to disassemble an OpenCL kernel: <ol> <li> Use <code>clGetEventProfilingInfo()</code> to dump the ptx code to a file, say <code>ptxfile.ptx</code>. Please refer to the OpenCL specification to have more details on this function. </li> <li> Use nvcc to compile ptx to cubin file, for example: <code>nvcc -cubin -arch=sm_20 ptxfile.ptx</code> will compile <code>ptxfile.ptx</code> onto a compute capability 2.0 device. </li> <li> Use <code>cuobjdump</code> to disassemble the cubin file into GPU instructions. For example: <code>cuobjdump -sass ptxfile.cubin</code> </li> </ol> Hope this helps.

The simplest solution, in my experience, is to use clangs OpenCL C compiler and emit SPIR. It even works on Godbolt's compiler explorer: https://godbolt.org/z/_JbXPb Clang can also emit ptx (https://godbolt.org/z/4ARMqM) and amdhsa (https://godbolt.org/z/TduTZQ), but it may not correspond to the ptx and amdhsa assembly generated by the respective driver at runtime.

I know that this is an old question, but in case someone comes looking here for disassembling a AMD GPU kernel, you can do the following in linux: <pre class="prettyprint"><code>export GPU_DUMP_DEVICE_KERNEL=3 </code></pre> This make any kernel that is compiled on your machine dump the assembled code to a file in the same directory. Source: http://dis.unal.edu.co/~gjhernandezp/TOS/GPU/ATI_Stream_SDK_OpenCL_Programming_Guide.pdf Sections 4.2.1 and 4.2.2

Disassemble an OpenCL kernel?

3 Answers

If you're using NVIDIA's OpenCL implementation for their GPUs, you can do the followings to disassemble an OpenCL kernel:

Use clGetEventProfilingInfo() to dump the ptx code to a file, say ptxfile.ptx. Please refer to the OpenCL specification to have more details on this function.
Use nvcc to compile ptx to cubin file, for example: nvcc -cubin -arch=sm_20 ptxfile.ptx will compile ptxfile.ptx onto a compute capability 2.0 device.
Use cuobjdump to disassemble the cubin file into GPU instructions. For example: cuobjdump -sass ptxfile.cubin

Hope this helps.

109

answered Oct 11 '22 10:10

Zk1001

The simplest solution, in my experience, is to use clangs OpenCL C compiler and emit SPIR. It even works on Godbolt's compiler explorer: https://godbolt.org/z/_JbXPb

Clang can also emit ptx (https://godbolt.org/z/4ARMqM) and amdhsa (https://godbolt.org/z/TduTZQ), but it may not correspond to the ptx and amdhsa assembly generated by the respective driver at runtime.

answered Oct 11 '22 08:10

Andreas Gravgaard Andersen

I know that this is an old question, but in case someone comes looking here for disassembling a AMD GPU kernel, you can do the following in linux:

export GPU_DUMP_DEVICE_KERNEL=3

This make any kernel that is compiled on your machine dump the assembled code to a file in the same directory.

Source: http://dis.unal.edu.co/~gjhernandezp/TOS/GPU/ATI_Stream_SDK_OpenCL_Programming_Guide.pdf

Sections 4.2.1 and 4.2.2

answered Oct 11 '22 09:10

KLee1

Related questions
                            
                                Work Group Sizes
                            
                                What was the real reason why Google is chosing RenderScript instead of OpenCL? [closed]
                            
                                When to use cudaHostRegister() and cudaHostAlloc()? What is the meaning of "Pinned or page-locked" memory? Which are the equivalent in OpenCL?
                            
                                My OpenCL kernel is slower on faster hardware.. But why?
                            
                                (rendering particles) Should I learn shader or OpenCL?
                            
                                Running OpenCL on hardware from mixed vendors
                            
                                OpenCL synchronization between work-groups
                            
                                macro for simulating access two dimensional array in C
                            
                                Getting Started with OpenCL on Windows 7
                            
                                Getting started with PyOpenCL
                            
                                How to optimize OpenCL code for neighbors accessing?
                            
                                OpenCL and GPU programming Roadmap
                            
                                Measuring execution time of OpenCL kernels
                            
                                Is there a limit to OpenCL local memory?
                            
                                How to determine max size of images I can safely pass to/from OpenCL kernel?
                            
                                How to setup OpenCL on AMD videocard with opensource driver?
                            
                                Are there any good 3rd party libraries build on top of openCL yet?
                            
                                What is the algorithm to determine optimal work group size and number of workgroup
                            
                                Aligning GPU memory accesses of an image convolution (OpenCL/CUDA) kernel
                            
                                Does AMD's OpenCL offer something similar to CUDA's GPUDirect?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Disassemble an OpenCL kernel?

Tags:

gpgpu

gpu

disassembly

opencl

Patrick

People also ask

3 Answers

Zk1001

Andreas Gravgaard Andersen

KLee1

Recent Activity

Donate For Us