OpenCV CUDA running slower than OpenCV CPU

Tags:

I've been struggling to get OpenCV CUDA to improve performance for things like erode/dilate, frame differencing etc when i read in a video from an avi file. typical i get half the FPS on the GPU (580gtx) than on the CPU (AMD 955BE). Before u ask if i'm measuring fps correctly, you can clearly see the lag on the GPU with the naked eye especially when using a high erode/dilate level.

It seems that i'm not reading in the frames in parallel?? Here is the code:

#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/video/tracking.hpp>
#include <opencv2/gpu/gpu.hpp>
#include <stdlib.h>
#include <stdio.h>

using namespace cv;
using namespace cv::gpu;

Mat cpuSrc;
GpuMat src, dst;

int element_shape = MORPH_RECT;

//the address of variable which receives trackbar position update
int max_iters = 10;
int open_close_pos = 0;
int erode_dilate_pos = 0;

// callback function for open/close trackbar
void OpenClose(int)
{
     IplImage disp;
     Mat temp;
    int n = open_close_pos - max_iters;
    int an = n > 0 ? n : -n;
    Mat element = getStructuringElement(element_shape, Size(an*2+1, an*2+1), Point(an, an) );
    if( n < 0 )
        cv::gpu::morphologyEx(src, dst, CV_MOP_OPEN, element);
    else
        cv::gpu::morphologyEx(src, dst, CV_MOP_CLOSE, element);

    dst.download(temp);
    disp = temp;    
   // cvShowImage("Open/Close",&disp);
}

// callback function for erode/dilate trackbar
void ErodeDilate(int)
{
     IplImage disp;
     Mat temp;
    int n = erode_dilate_pos - max_iters;
    int an = n > 0 ? n : -n;
    Mat element = getStructuringElement(element_shape, Size(an*2+1, an*2+1), Point(an, an) );
    if( n < 0 )
        cv::gpu::erode(src, dst, element);
    else
        cv::gpu::dilate(src, dst, element);
    dst.download(temp);
    disp = temp;    
    cvShowImage("Erode/Dilate",&disp);
}


int main( int argc, char** argv )
{

    VideoCapture capture("TwoManLoiter.avi");

    //create windows for output images
    namedWindow("Open/Close",1);
    namedWindow("Erode/Dilate",1);

    open_close_pos = 3;
    erode_dilate_pos = 0;
    createTrackbar("iterations", "Open/Close",&open_close_pos,max_iters*2+1,NULL);
    createTrackbar("iterations", "Erode/Dilate",&erode_dilate_pos,max_iters*2+1,NULL);

    for(;;)
    {

         capture >> cpuSrc;
         src.upload(cpuSrc);
         GpuMat grey;
         cv::gpu::cvtColor(src, grey, CV_BGR2GRAY); 
         src = grey;

        int c;

        ErodeDilate(erode_dilate_pos);
        c = cvWaitKey(25);

        if( (char)c == 27 )
            break;

    }

    return 0;
}

The CPU implementation is the same minus using namespace cv::gpu and the Mat instead of GpuMat of course.

Thanks

697

asked Sep 23 '11 09:09

user779328

1 Answers

My guess would be, that the performance gain from the GPU erode/dilate is overweighted by the memory operations of transferring the image to and from the GPU every frame. Keep in mind that memory bandwidth is a crucial factor in GPGPU algorithms, and even more the bandwidth between CPU and GPU.

EDIT: To optimize it you might write your own image display routine (instead of cvShowImage) that uses OpenGL and just displays the image as an OpenGL texture. In this case you don't need to read the processed image from the GPU back to CPU and you can directly use an OpenGL texture/buffer as a CUDA image/buffer, so you don't even need to copy the image inside the GPU. But in this case you might have to manage CUDA resources yourself. With this method you might also use PBOs to upload the video into the texture and profit a bit from asynchronity.

101

answered Oct 05 '22 23:10

Christian Rau

Related questions
                            
                                Problem debugging with Eclipse CDT - Execution and GUI not aligned
                            
                                Using an unordered_map where Key is a member of T
                            
                                How access class variables in c++
                            
                                How to disable std::cerr?
                            
                                Trying to close OpenCV window has no effect
                            
                                Problem with functions accepting inner classes of template classes
                            
                                Designing a shader class
                            
                                GDB: warning: Multiple breakpoints were set on overloaded methods
                            
                                std::multimap getting two ranges
                            
                                How do programming languages/libraries communicate with hardware? [closed]
                            
                                Is there a libraray/method for 80 column formatted text output? [closed]
                            
                                Header-only C++ library (GLM) doesn't compile with Android-NDK
                            
                                Why do I get a different value at run-time when type-casting a string to DWORD?
                            
                                Can C++ class instances on the stack be captured by Objective-C blocks?
                            
                                Linking dependencies of a shared library
                            
                                Why does overload of template and non-template function with the "same signature" call the non-template function?
                            
                                c++ Read from .csv file
                            
                                Virtual Inheritance: Error: no unique final overrider
                            
                                Is it safe for multiple threads to call the same function?
                            
                                ifstream::is_open vs ifstream::fail?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

OpenCV CUDA running slower than OpenCV CPU

Tags:

c++

opencv

parallel-processing

cuda

user779328

People also ask

1 Answers

Christian Rau

Recent Activity

Donate For Us