What's the difference between PTX and CUBIN w.r.t. the NVCC compiler?

Tags:

I have CUDA 4.0 installed, and a device with Compute Capability 2.0 (a GTX 460 card).

What is the difference between the 'cubin' and the 'ptx' file?

I think the cubin is a native code for the gpu so this is micro-architecture specific, and the ptx is an intermediate language that run on Fermi devices (e.g. Geforce GTX 460) via JIT compilation. When I compile a .cu source file, I can choose between the ptx or cubin target. If I want the cubin file, I choose the code=sm_20. But if I want a ptx file I use the code=compute_20.

Is it correct?

976

asked Oct 08 '11 10:10

user973764

1 Answers

You have mixed up the options to select a compilation phase (-ptx and -cubin) with the options to control which devices to target (-code), so you should revisit the documentation.

NVCC is the NVIDIA compiler driver. The -ptx and -cubin options are used to select specific phases of compilation, by default, without any phase-specific options nvcc will attempt to produce an executable from the inputs. Most people use the -c option to cause nvcc to produce an object file which will later be linked into an executable by the default platform linker, the -ptx and -cubin options are only really useful if you are using the Driver API. For more information on the intermediate stages, check out the nvcc manual which is installed when you install the CUDA Toolkit.

The output from -ptx is a plain-text PTX file. PTX is an intermediate assembly language for NVIDIA GPUs which has not yet been fully optimised and will later be assembled to the device-specific code (different devices have different register counts for example, hence fully optimising PTX would be wrong).
The output from -cubin is a fat binary which may contain one or more device-specific binary images as well as (optionally) PTX.

The -code argument you refer to has a different purpose entirely. I'd encourage you to check out the nvcc documentation which contains several examples, in general I would advise using the -gencode option instead since it allows more control and allows you to target multiple devices in one binary. As a quick example:

-gencode arch=compute_xx,code=\'compute_xx,sm_yy,sm_zz\' causes nvcc to target all devices with compute capability xx (that's the arch= bit) and to embed PTX (code=compute_xx) as well as device specific binaries for sm_yy and sm_zz into the final fat binary.

answered Sep 18 '22 11:09

Tom

Related questions
                            
                                Cuda parallel code generation in Visual Studio
                            
                                NVIDIA CUDA: What is the developer driver?
                            
                                Aligning GPU memory accesses of an image convolution (OpenCL/CUDA) kernel
                            
                                Does AMD's OpenCL offer something similar to CUDA's GPUDirect?
                            
                                Matrix-vector multiplication in CUDA: benchmarking & performance
                            
                                What are the default values for arch and code options when using nvcc?
                            
                                Branch predication on GPU
                            
                                Building a tiny R package with CUDA and Rcpp
                            
                                Getting starting with Parallel programming [closed]
                            
                                How to get VS 2010 to recognize certain CUDA functions
                            
                                Why does the Cuda runtime reserve 80 GiB virtual memory upon initialization?
                            
                                cuda shared library linking: undefined reference to cudaRegisterLinkedBinary
                            
                                Integer min/max in CUDA
                            
                                CUDA surfaces vs textures
                            
                                Limiting register usage in CUDA: __launch_bounds__ vs maxrregcount
                            
                                Can nvidia-docker be run without a GPU?
                            
                                Trying to 'Make' CUDA SDK, ld cannot find library, ldconfig says it can
                            
                                Compiling Cuda code in Qt Creator on Windows
                            
                                CMake: how to add cuda to existing project
                            
                                Simple CUDA Test always fails with "an illegal memory access was encountered" error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the difference between PTX and CUBIN w.r.t. the NVCC compiler?

Tags:

cuda

nvidia

nvcc

ptx

user973764

People also ask

1 Answers

Tom

Recent Activity

Donate For Us