Why should I use the CUDA Driver API instead of CUDA Runtime API?

2 Answers

The runtime API is an higher level of abstraction over the driver API and it's usually easier to use (the performance gap should be minimal). The driver API is a handle-based one and provides a higher degree of control. The runtime API, on the contrary, is easier to use (e.g. you can use the kernel<<<>>> launch syntax).

That "higher degree of control" means that with the driver API you have to deal with module initialization and memory management in a more verbose way, but that allows you to do more stuff, e.g. disable the driver JIT optimizations for the kernel code:

CU_JIT_OPTIMIZATION_LEVEL - Level of optimizations to apply to generated code (0 - 4), with 4 being the default and highest level of optimizations. Option type: unsigned int

From http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/group__CUDA__TYPES_gfaa9995214a4f3341f48c5830cea0d8a.html

This isn't currently possible via code with the runtime API. Finer degree of control means that you might render things broken or slower, don't use it if you don't know what they are.

You should usually only use either the runtime API or the driver API in your application although, with newer CUDA versions, runtime API code can peacefully coexist with driver API code (http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf)

An application can mix runtime API code with driver API code.

answered Nov 11 '22 20:11

Marco A.

To add to and expand on an excellent answer by @Marco. One major function that driver API makes available is loading kernels at runtime. This is covered by module portion of driver API, and here is the overview:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#module

With runtime API, all the kernels are automatically loaded during the initialization, and stay loaded for as long as the program runs. With driver API, programmer has explicit control over loading and unloading kernels. The latter can be used, for instance, to download updated kernel versions from the Internet. Another use is keeping only the currently relevant modules loaded, even though this is rarely a concern given the typically small size of kernels relative to the rest of the program.

[Update: deleted irrelevant stuff]

answered Nov 11 '22 20:11

void_ptr

Related questions
                            
                                What is the difference between the cuda api cu... and cuda...?
                            
                                Why am I getting dynamic initialization not supported for __device__, __constant__, __shared__?
                            
                                CUDA: Scatter communication pattern
                            
                                CUDA 9 shfl vs. shfl_sync
                            
                                Doing multiple matrix-matrix multiplications in one operation
                            
                                Initialize device array in CUDA
                            
                                CUDA-capable device
                            
                                CUDA performance improves when running more threads than there are cores
                            
                                Build Customization for CUDA 5.0 not found in Visual C++
                            
                                How to deal with the ECC support feature in NVIDIA graphics cards
                            
                                CUDA compile problems on Windows, Cmake error: No CUDA toolset found
                            
                                cuda device selection with multiple cpu threads
                            
                                Can we benchmark how fast CUDA or OpenCL is compared to CPU performance?
                            
                                Cuda code #define error, expected a ")"
                            
                                Any Advantage of MPI+CUDA over just pure MPI?
                            
                                BLAS equivalent of a LAPACK function for GPUs
                            
                                how to calculate an average from a int2 array using Thrust
                            
                                How to create a CUDA context?
                            
                                Advice for real time image processing
                            
                                CUDA Too many resources requested for launch

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why should I use the CUDA Driver API instead of CUDA Runtime API?

Tags:

cuda

gpgpu

nvidia

Alex

People also ask

2 Answers

Marco A.

void_ptr

Recent Activity

Donate For Us