Advantage of using a CUDA Stream

Tags:

I am trying to understand where a Stream might help me with processing multiple Regions of Interest on a video frame. If using NPP functions that support a stream, is this a case where one would launch as many streams as there are ROIs? Possibly even creating a CPU thread for each Stream? Or is the benefit in using one stream to process all the ROIs and possibly using this single stream from multiple threads in the CPU?

244

asked Feb 01 '17 00:02

AeroClassics

1 Answers

In CUDA, usage of streams generally helps to better utilize GPU in two ways. Firstly, memory copies between host and device can be overlapped by kernel execution if copying and execution occur in different streams. Secondly, individual kernels running in different streams can overlap if there are enough resources on the GPU.

Further, whether creating a thread for each ROI would help depends on comparison of GPU vs CPU (if any) utilization. If there is a lot of processing on CPU and CPU holds off GPU computation, creating more threads helps.

There are further details (see the documentation for actual version of CUDA) which constrain overlapping of operations in the streams. A memory copy overlaps with a kernel execution only if memory source or destination in RAM is page-locked. Or, synchronization between streams occurs when host thread issues command(s) in the default stream. (Since CUDA 7 each thread has its own default stream, so processing ROIs in different threads would help again.)

Hence, satisfying certain conditions, it should improve performance of your algorithm if the processing of ROIs occurs in different streams up to certain limit (depending on resource consumption of the kernels, ratio of memory copies and computation, etc...)

answered Sep 20 '22 04:09

stuhlo

Related questions
                            
                                Building kd-tree in cuda
                            
                                Parallel Sum for Vectors
                            
                                Operating on different elements of std::vector in parallel
                            
                                processing a headered CSV file with gnu parallel
                            
                                How to send a message without a specific destination in MPI?
                            
                                MPI_ERR_BUFFER: invalid buffer pointer
                            
                                Best way to read csv file in C# to improve time efficiency
                            
                                Implementing CUDA VecAdd from sample code
                            
                                Parallelization in R: %dopar% vs %do%. Why using a single core yields to better performance?
                            
                                Powershell: Don't wait on function return
                            
                                How to kill a doMC worker when it's done?
                            
                                Is there a simulator/emulator of Xeon Phi?
                            
                                How to parallelise an algorithm that includes a sparse matrix, in R
                            
                                In theory, is find_end parallelizable?
                            
                                putting `mclapply` results back onto data.frame
                            
                                Using Rcpp function in parLapply on Windows
                            
                                Query Cost vs. Execution Speed + Parallelism
                            
                                How to parallelize do while and while loop in openmp?
                            
                                Does false sharing also occur when threads only write to the same cache block?
                            
                                doParallel (package) foreach does not work for big iterations in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Advantage of using a CUDA Stream

Tags:

parallel-processing

cuda

emgucv

opencv3.1

managed-cuda

AeroClassics

People also ask

1 Answers

stuhlo

Recent Activity

Donate For Us