Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenCV GPU Farneback Optical Flow badly works in multi-threading

My application uses the Opencv gpu class gpu::FarnebackOpticalFlow to compute the optical flow between a pair of consecutive frames of an input video. In order to speed-up the process, I exploited the TBB support of OpenCV to run the method in multi-threading. However, the multi-threading performance does not behave like the single-threaded one. Just to give you an idea of the different behaviour, here are two snapshots, respectively of the single threaded and the multi threaded implementation.

single threaded optical flow multi threaded optical flow

The multi-threaded implementation assumes to split the image in 8 different stripes (the number of cores on my pc), and the gpu method for the Farneback implementation of the optical flow is applied on each of them. Here are the corresponding code lines for both methods:

Single-threaded implementation

/* main.cpp */
//prevImg and img are the input Mat images extracted from the input video
...
GpuMat gpuImg8U(img);
GpuMat gpuPrevImg8U(prevImg);   
GpuMat u_flow, v_flow;
gpu::FarnebackOpticalFlow farneback_flow;
farneback_flow.numLevels = maxLayer;
farneback_flow.pyrScale = 0.5;
farneback_flow.winSize = windows_size;
farneback_flow.numIters = of_iterations;
farneback_flow(gpuPrevImg8U,gpuImg8U,u_flow,v_flow);
getFlowField(Mat(u_flow),Mat(v_flow),optical_flow);

...
}

void getFlowField(const Mat& u, const Mat& v, Mat& flowField){    
    for (int i = 0; i < flowField.rows; ++i){
        const float* ptr_u = u.ptr<float>(i);
        const float* ptr_v = v.ptr<float>(i);
        Point2f* row = flowField.ptr<Point2f>(i);

        for (int j = 0; j < flowField.cols; ++j){
            row[j].y = ptr_v[j];
            row[j].x = ptr_u[j];
        }
    }
}

Multi-threaded implementation

/* parallel.h */
class ParallelOpticalFlow : public cv::ParallelLoopBody {

    private:
        int coreNum;
        cv::gpu::GpuMat img, img2;
        cv::gpu::FarnebackOpticalFlow& farneback_flow;
        const cv::gpu::GpuMat u_flow, v_flow;
        cv::Mat& optical_flow;

    public:
        ParallelOpticalFlow(int cores, cv::gpu::FarnebackOpticalFlow& flowHandler, cv::gpu::GpuMat img_, cv::gpu::GpuMat img2_, const cv::gpu::GpuMat u, const cv::gpu::GpuMat v, cv::Mat& of)
                    : coreNum(cores), farneback_flow(flowHandler), img(img_), img2(img2_), u_flow(u), v_flow(v), optical_flow(of){}

        virtual void operator()(const cv::Range& range) const;

};


/* parallel.cpp*/
void ParallelOpticalFlow::operator()(const cv::Range& range) const {

    for (int k = range.start ; k < range.end ; k ++){

        cv::gpu::GpuMat img_rect(img,cv::Rect(0,img.rows/coreNum*k,img.cols,img.rows/coreNum));
        cv::gpu::GpuMat img2_rect(img2,cv::Rect(0,img2.rows/coreNum*k,img2.cols,img2.rows/coreNum));
        cv::gpu::GpuMat u_rect(u_flow,cv::Rect(0,u_flow.rows/coreNum*k,u_flow.cols,u_flow.rows/coreNum));
        cv::gpu::GpuMat v_rect(v_flow,cv::Rect(0,v_flow.rows/coreNum*k,v_flow.cols,v_flow.rows/coreNum));
        cv::Mat of_rect(optical_flow,cv::Rect(0,optical_flow.rows/coreNum*k,optical_flow.cols,optical_flow.rows/coreNum));

        farneback_flow(img_rect,img2_rect,u_rect,v_rect);
        getFlowField(Mat(u_rect),Mat(v_rect),of_rect);
    }
}

/* main.cpp */

    parallel_for_(Range(0,cores_num),ParallelOpticalFlow(cores_num,farneback_flow,gpuPrevImg8U,gpuImg8U,u_flow,v_flow,optical_flow));

The codes look like equivalent in the two cases. Can anyone explain me why there are these different behaviours? Or if there are some mistakes in my code? Thanks in advance for your answers

like image 719
Marco Ferro Avatar asked Jan 25 '16 10:01

Marco Ferro


1 Answers

GPU module is not thread-safe. It uses some global variables, like __constant__ memory and texture reference API, which can lead to data race if used in multi-threaded environment.

like image 59
jet47 Avatar answered Nov 16 '22 13:11

jet47