Any particular function to initialize GPU other than the first cudaMalloc call?

Tags:

The first cudaMalloc call is slow (like 0.2 sec) because of some initialization work on GPU. Is there any function that solely do initialization, so that I can separate the time? cudaSetDevice seems to reduce the time to 0.15 secs, but still does not eliminate all init overheads.

866

asked Mar 01 '13 20:03

szli

1 Answers

A call to

cudaFree(0);

is the canonical way to force lazy context establishment in the CUDA runtime. You can't reduce the overhead, that is a function of driver, runtime and operating system latencies. But the call above will let you control how/when those overheads occur during program execution.

EDIT in 2015 to add that the heuristics of context initialisation in the runtime API have subtly changed over time so that cudaSetDevice now establishes a context, so the cudaFree() call isn't explicitly required to intialise a context, you can use cudaSetDeviceinstead. Also note that some set-up time will still be incurred at the first kernel launch, whereas before this wasn't the case. For for kernel timing, it is best to include a warm-up call first before launching the kernel you will time to remove this set-up latency. It appears that the various profiling tools have enough granularity built in to avoid this without any extra API calls or kernel calls.

answered Sep 26 '22 08:09

talonmies

Related questions
                            
                                Forcing CUDA to use register for a variable
                            
                                What is the maximum block count possible in CUDA?
                            
                                Which CUDA Toolkit version for older NVIDIA Driver
                            
                                Easiest way to test for existence of cuda-capable GPU from cmake?
                            
                                Installing theano on Windows 8 with GPU enabled
                            
                                Timing CUDA operations
                            
                                Funnel shift - what is it?
                            
                                Financial applications on GPGPU
                            
                                How to calculate the speedup of a GPU program?
                            
                                Can I use C++11 in the .cu-files (CUDA5.5) in Windows7x64 (MSVC) and Linux64 (GCC4.8.2)?
                            
                                Could not insert 'nvidia_352': No such device
                            
                                Are there advantages to using the CUDA vector types?
                            
                                How to find epsilon, min and max constants for CUDA?
                            
                                TensorFlow: libcudart.so.7.5: cannot open shared object file: No such file or directory
                            
                                CUDA Runtime API error 38: no CUDA-capable device is detected
                            
                                CUDA __device__ Unresolved extern function [duplicate]
                            
                                Lambda expressions with CUDA
                            
                                Multiple processes launching CUDA kernels in parallel
                            
                                Copy an object to device?
                            
                                Understanding this CUDA kernels launch parameters

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Any particular function to initialize GPU other than the first cudaMalloc call?

Tags:

cuda

gpu

szli

People also ask

1 Answers

talonmies

Recent Activity

Donate For Us