CUDA Driver API vs. CUDA runtime

Tags:

When writing CUDA applications, you can either work at the driver level or at the runtime level as illustrated on this image (The libraries are CUFFT and CUBLAS for advanced math):

CUDA layer model
_{(source: tomshw.it)}

I assume the tradeoff between the two are increased performance for the low-evel API but at the cost of increased complexity of code. What are the concrete differences and are there any significant things which you cannot do with the high-level API?

I am using CUDA.net for interop with C# and it is built as a copy of the driver API. This encourages writing a lot of rather complex code in C# while the C++ equivalent would be more simple using the runtime API. Is there anything to win by doing it this way? The one benefit I can see is that it is easier to integrate intelligent error handling with the rest of the C# code.

371

asked Oct 28 '08 11:10

Morten Christiansen

2 Answers

The CUDA runtime makes it possible to compile and link your CUDA kernels into executables. This means that you don't have to distribute cubin files with your application, or deal with loading them through the driver API. As you have noted, it is generally easier to use.

In contrast, the driver API is harder to program but provided more control over how CUDA is used. The programmer has to directly deal with initialization, module loading, etc.

Apparently more detailed device information can be queried through the driver API than through the runtime API. For instance, the free memory available on the device can be queried only through the driver API.

From the CUDA Programmer's Guide:

It is composed of two APIs:

A low-level API called the CUDA driver API,

A higher-level API called the CUDA runtime API that is implemented on top of the CUDA driver API.

These APIs are mutually exclusive: An application should use either one or the other.

The CUDA runtime eases device code management by providing implicit initialization, context management, and module management. The C host code generated by nvcc is based on the CUDA runtime (see Section 4.2.5), so applications that link to this code must use the CUDA runtime API.

In contrast, the CUDA driver API requires more code, is harder to program and debug, but offers a better level of control and is language-independent since it only deals with cubin objects (see Section 4.2.5). In particular, it is more difficult to configure and launch kernels using the CUDA driver API, since the execution configuration and kernel parameters must be specified with explicit function calls instead of the execution configuration syntax described in Section 4.2.3. Also, device emulation (see Section 4.5.2.9) does not work with the CUDA driver API.

There is no noticeable performance difference between the API's. How your kernels use memory and how they are laid out on the GPU (in warps and blocks) will have a much more pronounced effect.

answered Sep 19 '22 23:09

mch

I have found that for deployment of libraries in multi-threaded applications, the control over CUDA context provided by the driver API was critical. Most of my clients want to integrate GPU acceleration into existing applications, and these days, almost all applications are multi-threaded. Since I could not guarantee that all GPU code would be initialized, executed and deallocated from the same thread, I had to use the driver API.

My initial attempts with various work-arounds in the runtime API all led to failure, sometimes in spectacular fashion - I found I could repeatedly, instantly reboot a machine by performing just the wrong set of CUDA calls from different threads.

Since we migrated everything over the Driver API, all has been well.

answered Sep 18 '22 23:09

Jason Dale

Related questions
                            
                                App.Config vs. AppName.exe.Config
                            
                                Learning to use Interfaces effectively
                            
                                Referring to a generic type of a generic type in C# XML documentation?
                            
                                Save modified WordprocessingDocument to new file
                            
                                What is the difference between dataview and datatable?
                            
                                how can I use xbuild to build release binary
                            
                                An attribute argument must be a constant expression, ...- Create an attribute of type array
                            
                                Can an immutable type change its internal state?
                            
                                Why are .NET value types sealed?
                            
                                do I need a return after throwing exception (c++ and c#)
                            
                                Preprocessor directive in C# for importing based on platform
                            
                                Use "real" CultureInfo.CurrentCulture in WPF Binding, not CultureInfo from IetfLanguageTag
                            
                                c# SmtpClient class not able to send email using gmail
                            
                                Is there any difference between myNullableLong.HasValue and myNullableLong != null?
                            
                                What is the lifetime of a delegate created by a lambda in C#?
                            
                                C# How to detect an object is already locked
                            
                                Is there a standard way to organize methods within a class? [duplicate]
                            
                                How to keep XmlSerializer from killing NewLines in Strings?
                            
                                Mocking classes that implement IQueryable with Moq
                            
                                Unit testing a class that uses a Timer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CUDA Driver API vs. CUDA runtime

Tags:

c++

c#

cuda

gpgpu

cuda.net

Morten Christiansen

People also ask

2 Answers

mch

Jason Dale

Recent Activity

Donate For Us