Why should I use the CUDA Driver API, and in which cases I can't use CUDA Runtime API (which is more convenient than Driver API)?
The CUDA runtime eases device code management by providing implicit initialization, context management, and module management. The C host code generated by nvcc is based on the CUDA runtime (see Section 4.2. 5), so applications that link to this code must use the CUDA runtime API.
Nvidia driver includes driver kernel module and user libraries. Cuda toolkit is an SDK contains compiler, api, libs, docs, etc...
CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach called general-purpose computing on GPUs (GPGPU).
The API runtime platform enables the execution of the APIs. It enables the API to receive requests from apps or Web sites and send responses back. Most commonly, the API platform is an HTTP server, which allows exposing services via HTTP. HTTP is the common protocol for REST APIs.
The runtime API is an higher level of abstraction over the driver API and it's usually easier to use (the performance gap should be minimal). The driver API is a handle-based one and provides a higher degree of control. The runtime API, on the contrary, is easier to use (e.g. you can use the kernel<<<>>>
launch syntax).
That "higher degree of control" means that with the driver API you have to deal with module initialization and memory management in a more verbose way, but that allows you to do more stuff, e.g. disable the driver JIT optimizations for the kernel code:
CU_JIT_OPTIMIZATION_LEVEL - Level of optimizations to apply to generated code (0 - 4), with 4 being the default and highest level of optimizations. Option type: unsigned int
From http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/group__CUDA__TYPES_gfaa9995214a4f3341f48c5830cea0d8a.html
This isn't currently possible via code with the runtime API. Finer degree of control means that you might render things broken or slower, don't use it if you don't know what they are.
You should usually only use either the runtime API or the driver API in your application although, with newer CUDA versions, runtime API code can peacefully coexist with driver API code (http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf)
An application can mix runtime API code with driver API code.
To add to and expand on an excellent answer by @Marco. One major function that driver API makes available is loading kernels at runtime. This is covered by module portion of driver API, and here is the overview:
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#module
With runtime API, all the kernels are automatically loaded during the initialization, and stay loaded for as long as the program runs. With driver API, programmer has explicit control over loading and unloading kernels. The latter can be used, for instance, to download updated kernel versions from the Internet. Another use is keeping only the currently relevant modules loaded, even though this is rarely a concern given the typically small size of kernels relative to the rest of the program.
[Update: deleted irrelevant stuff]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With