Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenCV CUDA running slower than OpenCV CPU

I've been struggling to get OpenCV CUDA to improve performance for things like erode/dilate, frame differencing etc when i read in a video from an avi file. typical i get half the FPS on the GPU (580gtx) than on the CPU (AMD 955BE). Before u ask if i'm measuring fps correctly, you can clearly see the lag on the GPU with the naked eye especially when using a high erode/dilate level.

It seems that i'm not reading in the frames in parallel?? Here is the code:

#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/video/tracking.hpp>
#include <opencv2/gpu/gpu.hpp>
#include <stdlib.h>
#include <stdio.h>

using namespace cv;
using namespace cv::gpu;

Mat cpuSrc;
GpuMat src, dst;

int element_shape = MORPH_RECT;

//the address of variable which receives trackbar position update
int max_iters = 10;
int open_close_pos = 0;
int erode_dilate_pos = 0;

// callback function for open/close trackbar
void OpenClose(int)
{
     IplImage disp;
     Mat temp;
    int n = open_close_pos - max_iters;
    int an = n > 0 ? n : -n;
    Mat element = getStructuringElement(element_shape, Size(an*2+1, an*2+1), Point(an, an) );
    if( n < 0 )
        cv::gpu::morphologyEx(src, dst, CV_MOP_OPEN, element);
    else
        cv::gpu::morphologyEx(src, dst, CV_MOP_CLOSE, element);

    dst.download(temp);
    disp = temp;    
   // cvShowImage("Open/Close",&disp);
}

// callback function for erode/dilate trackbar
void ErodeDilate(int)
{
     IplImage disp;
     Mat temp;
    int n = erode_dilate_pos - max_iters;
    int an = n > 0 ? n : -n;
    Mat element = getStructuringElement(element_shape, Size(an*2+1, an*2+1), Point(an, an) );
    if( n < 0 )
        cv::gpu::erode(src, dst, element);
    else
        cv::gpu::dilate(src, dst, element);
    dst.download(temp);
    disp = temp;    
    cvShowImage("Erode/Dilate",&disp);
}


int main( int argc, char** argv )
{

    VideoCapture capture("TwoManLoiter.avi");

    //create windows for output images
    namedWindow("Open/Close",1);
    namedWindow("Erode/Dilate",1);

    open_close_pos = 3;
    erode_dilate_pos = 0;
    createTrackbar("iterations", "Open/Close",&open_close_pos,max_iters*2+1,NULL);
    createTrackbar("iterations", "Erode/Dilate",&erode_dilate_pos,max_iters*2+1,NULL);

    for(;;)
    {

         capture >> cpuSrc;
         src.upload(cpuSrc);
         GpuMat grey;
         cv::gpu::cvtColor(src, grey, CV_BGR2GRAY); 
         src = grey;

        int c;

        ErodeDilate(erode_dilate_pos);
        c = cvWaitKey(25);

        if( (char)c == 27 )
            break;

    }

    return 0;
}

The CPU implementation is the same minus using namespace cv::gpu and the Mat instead of GpuMat of course.

Thanks

like image 697
user779328 Avatar asked Sep 23 '11 09:09

user779328


People also ask

Is OpenCV GPU accelerated?

OpenCV includes GPU module that contains all GPU accelerated stuff.

Does OpenCV use GPU by default?

By default, each of the OpenCV CUDA algorithms uses a single GPU. If you need to utilize multiple GPUs, you have to manually distribute the work between GPUs.

Does OpenCV require CUDA?

Building OpenCV without CUDA support does not perform device code compilation, so it does not require the CUDA Toolkit installed.


1 Answers

My guess would be, that the performance gain from the GPU erode/dilate is overweighted by the memory operations of transferring the image to and from the GPU every frame. Keep in mind that memory bandwidth is a crucial factor in GPGPU algorithms, and even more the bandwidth between CPU and GPU.

EDIT: To optimize it you might write your own image display routine (instead of cvShowImage) that uses OpenGL and just displays the image as an OpenGL texture. In this case you don't need to read the processed image from the GPU back to CPU and you can directly use an OpenGL texture/buffer as a CUDA image/buffer, so you don't even need to copy the image inside the GPU. But in this case you might have to manage CUDA resources yourself. With this method you might also use PBOs to upload the video into the texture and profit a bit from asynchronity.

like image 101
Christian Rau Avatar answered Oct 05 '22 23:10

Christian Rau