I'm starting to program with CUDA, and in some examples I find the include files <code>cuda.h</code>, <code>cuda_runtime.h</code> and <code>cuda_runtime_api.h</code> included in the code. Can someone explain to me the difference between these files?

A few observations in addition to @talonmies answer: <ul> <li> <code>cuda_runtime.h</code> includes <code>cuda_runtime_api.h</code> internally, but not the other way around. So: "runtime includes all of runtime_api" is a mnemonic to remember.</li> <li> <code>cuda_runtime_api.h</code> does not have the entire runtime API functions you'll find in the official documentation, while <code>cuda_runtime.h</code> will have it all (example: <code>cudaEventCreate()</code>). However, all API calls defined <code>cuda_runtime.h</code> are actually implemented, in the header file itself, using calls to functions in <code>cuda_runtime_api.h</code>. These are the "function overlays" that @talonmies mentioned.</li> <li> <code>cuda_runtime_api.h</code> is a C-language header (IIANM) with only C-language function declarations; <code>cuda_runtime.h</code> is a C++ header file, with some templated functions implemented.</li> </ul>

Difference between cuda.h, cuda_runtime.h, cuda_runtime_api.h

Tags:

header-files

cuda

I'm starting to program with CUDA, and in some examples I find the include files cuda.h, cuda_runtime.h and cuda_runtime_api.h included in the code. Can someone explain to me the difference between these files?

905

asked Jun 10 '11 06:06

Renan

2 Answers

In very broad terms:

cuda.h defines the public host functions and types for the CUDA driver API.
cuda_runtime_api.h defines the public host functions and types for the CUDA runtime API
cuda_runtime.h defines everything cuda_runtime_api.h does, as well as built-in type definitions and function overlays for the CUDA language extensions and device intrinsic functions.

If you were writing host code to be compiled with the host compiler which includes API calls, you would include either cuda.h or cuda_runtime_api.h. If you needed other CUDA language built-ins, like types, and were using the runtime API and compiling with the host compiler, you would include cuda_runtime.h. If you are writing code which will be compiled using nvcc, it is all irrelevant, because nvcc takes care of inclusion of all the required headers automatically without programmer intervention.

141

answered Sep 28 '22 02:09

talonmies

A few observations in addition to @talonmies answer:

cuda_runtime.h includes cuda_runtime_api.h internally, but not the other way around. So: "runtime includes all of runtime_api" is a mnemonic to remember.
cuda_runtime_api.h does not have the entire runtime API functions you'll find in the official documentation, while cuda_runtime.h will have it all (example: cudaEventCreate()). However, all API calls defined cuda_runtime.h are actually implemented, in the header file itself, using calls to functions in cuda_runtime_api.h. These are the "function overlays" that @talonmies mentioned.
cuda_runtime_api.h is a C-language header (IIANM) with only C-language function declarations; cuda_runtime.h is a C++ header file, with some templated functions implemented.

answered Sep 28 '22 03:09

einpoklum

Related questions
                            
                                cudaStreamSynchronize vs CudaDeviceSynchronize vs cudaThreadSynchronize
                            
                                CUDA driver version is insufficient for CUDA runtime version
                            
                                In a CUDA kernel, how do I store an array in "local thread memory"?
                            
                                CUDA apps time out & fail after several seconds - how to work around this?
                            
                                Revert Apple Clang Version For NVCC
                            
                                CMake Error: Variables are set to NOTFOUND
                            
                                What does #pragma unroll do exactly? Does it affect the number of threads?
                            
                                When is CUDA's __shared__ memory useful?
                            
                                How does CUDA assign device IDs to GPUs?
                            
                                How to remove cuda completely from ubuntu?
                            
                                Why has atomicAdd not been implemented for doubles?
                            
                                What are the differences between CUDA compute capabilities?
                            
                                Ubuntu 16.04, CUDA 8 - CUDA driver version is insufficient for CUDA runtime version
                            
                                Should I unify two similar kernels with an 'if' statement, risking performance loss?
                            
                                How can I make tensorflow run on a GPU with capability 2.x?
                            
                                Is branch divergence really so bad?
                            
                                Can I program Nvidia's CUDA using only Python or do I have to learn C?
                            
                                Setting up Visual Studio Intellisense for CUDA kernel calls
                            
                                cuda block synchronization
                            
                                Default Pinned Memory Vs Zero-Copy Memory

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With