How much faster can an algorithm on CUDA or OpenCL code run compared to a general single processor core? (considering the algorithm is written and optimized for both the CPU and GPU target). I know it depends on both the graphics card and the CPU, but say, one of the fastest GPUs of NVIDIA and a (single core of a) Intel i7 processor ? And I know it also depends on the type of algorithm. I do not need a strict answer, but experienced examples like: for a image manipulation algorithm using double-precision floating point and 10 operations per pixel took first 5 minutes and now runs in x seconds using this hardware.

Your question is overly broad, and very difficult to answer. Moreover only a small percentage of algorithms (the ones that deal without much shared state) are feasable with GPUs. But I do want to urge you to be critical about claims. I'm in imageprocessing, and read many an article on the subject, but quite often in the GPU case, the time to upload input data to the GPU, and download the results back to main memory is not included in the calculation of the factor. While there are a few cases where this doesn't matter (both are small or there is a second stage calculation that further reduces the result in size), usually one does have to transfer the results and initial data. I've seen this turning a claimed plus into a negative, because the upload/download time alone was longer than the main CPU would require to do the calculation. Pretty much the same thing applies to combining results of different GPU cards. Update Newer GPUs seem to be able to upload/download and calculate at the same time using ping-pong buffers. But the advise to check the border conditions thoroughly still stands. There is a lot of spin out there. Update 2 Quite often using a GPU that is shared with video output for this is not optimal. Consider e.g. adding a low budget card for video, and using the onboard video for GPGPU tasks

Can we benchmark how fast CUDA or OpenCL is compared to CPU performance?

Tags:

c

cuda

gpu

opencl

cpu-speed

How much faster can an algorithm on CUDA or OpenCL code run compared to a general single processor core? (considering the algorithm is written and optimized for both the CPU and GPU target).

I know it depends on both the graphics card and the CPU, but say, one of the fastest GPUs of NVIDIA and a (single core of a) Intel i7 processor ?

And I know it also depends on the type of algorithm.

I do not need a strict answer, but experienced examples like: for a image manipulation algorithm using double-precision floating point and 10 operations per pixel took first 5 minutes and now runs in x seconds using this hardware.

892

asked Nov 24 '10 15:11

Roalt

1 Answers

Your question is overly broad, and very difficult to answer. Moreover only a small percentage of algorithms (the ones that deal without much shared state) are feasable with GPUs.

But I do want to urge you to be critical about claims. I'm in imageprocessing, and read many an article on the subject, but quite often in the GPU case, the time to upload input data to the GPU, and download the results back to main memory is not included in the calculation of the factor.

While there are a few cases where this doesn't matter (both are small or there is a second stage calculation that further reduces the result in size), usually one does have to transfer the results and initial data.

I've seen this turning a claimed plus into a negative, because the upload/download time alone was longer than the main CPU would require to do the calculation.

Pretty much the same thing applies to combining results of different GPU cards.

Update Newer GPUs seem to be able to upload/download and calculate at the same time using ping-pong buffers. But the advise to check the border conditions thoroughly still stands. There is a lot of spin out there.

Update 2 Quite often using a GPU that is shared with video output for this is not optimal. Consider e.g. adding a low budget card for video, and using the onboard video for GPGPU tasks

151

answered Oct 02 '22 19:10

Marco van de Voort

Related questions
                            
                                fscanf problem with reading in String
                            
                                Passing a pointer from C to assembly
                            
                                How can I get my program to do anything when a "multidigit number with all digits identical" appears?
                            
                                C: swapping pointers in an array
                            
                                Long Integer and Float
                            
                                Undefined/Unspecified/Implementation-defined behaviour warnings?
                            
                                Getting printf() to drop the trailing ".0" of values
                            
                                Is there a way to define C inline function in .c file rather than .h file?
                            
                                basename() returning int?
                            
                                Seeking C Union clarity
                            
                                typedef to store pointers in C
                            
                                get the text in the display with ncurses
                            
                                C - read string from buffer of certain size
                            
                                Semi-inheritance in C: How does this snippet work?
                            
                                Passing array of character strings to function
                            
                                What is the limit of optimization using SIMD?
                            
                                Makefile circular dependency
                            
                                Average execution time
                            
                                Need help optimizing algorithm - sum of all prime numbers under two million
                            
                                Splitting a line in C/C++ using whitespace as delimiter [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With