Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

super fast median of matrix in opencv (as fast as matlab)

I'm writing some code in openCV and want to find the median value of a very large matrix array (single channel grayscale, float).

I tried several methods such as sorting the array (using std::sort) and picking the middle entry but it is extremely slow when comparing with the median function in matlab. To be precise - what takes 0.25 seconds in matlab takes over 19 seconds in openCV.

My input image is originally a 12-bit greyscale image with the dimensions 3840x2748 (~10.5 megapixels), converted to float (CV_32FC1) where all the values are now mapped to the range [0,1] and at some point in the code I request the median value by calling:

double myMedianValue = medianMat(Input);

Where the function medianMat is:

double medianMat(cv::Mat Input){    
    Input = Input.reshape(0,1); // spread Input Mat to single row
    std::vector<double> vecFromMat;
    Input.copyTo(vecFromMat); // Copy Input Mat to vector vecFromMat    
    std::sort( vecFromMat.begin(), vecFromMat.end() ); // sort vecFromMat
        if (vecFromMat.size()%2==0) {return (vecFromMat[vecFromMat.size()/2-1]+vecFromMat[vecFromMat.size()/2])/2;} // in case of even-numbered matrix
    return vecFromMat[(vecFromMat.size()-1)/2]; // odd-number of elements in matrix
}

I timed the function medinaMat by itself and also the various parts - as expected the bottleneck is in:

std::sort( vecFromMat.begin(), vecFromMat.end() ); // sort vecFromMat

Does anyone here have an efficient solution?

Thanks!

EDIT I have tried using std::nth_element given in the answer of Adi Shavit.

The function medianMat now reads as:

double medianMat(cv::Mat Input){    
    Input = Input.reshape(0,1); // spread Input Mat to single row
    std::vector<double> vecFromMat;
    Input.copyTo(vecFromMat); // Copy Input Mat to vector vecFromMat
    std::nth_element(vecFromMat.begin(), vecFromMat.begin() + vecFromMat.size() / 2, vecFromMat.end());
    return vecFromMat[vecFromMat.size() / 2];
}

The runtime has lowered from over 19 seconds to 3.5 seconds. This is still nowhere near the 0.25 second in Matlab using the median function...

like image 341
CV_User Avatar asked May 06 '15 13:05

CV_User


3 Answers

Sorting and taking the middle element is not the most efficient way to find a median. It requires O(n log n) operations.

With C++ you should use std::nth_element() and take the middle iterator. This is an O(n) operation:

nth_element is a partial sorting algorithm that rearranges elements in [first, last) such that:

  • The element pointed at by nth is changed to whatever element would occur in that position if [first, last) was sorted.
  • All of the elements before this new nth element are less than or equal to the elements after the new nth element.

Also, your original data is 12 bit integers. Your implementation does a few things that make the comparison to Matlab problematic:

  1. You converted to floating point (CV_32FC1 or double or both) this is costly and takes time
  2. The code has an extra copy to a vector<double>
  3. Operations on float and especially doubles cost more than on integers.

Assuming your image is continuous in memory, as is the default for OpenCV you should use CV_16C1, and work directly on the data array after reshape()

Another option which should be very fast is to simply build a histogram of the image - this is a single pass on the image. Then, working on the histogram, find the bin that corresponds to half the pixels on each side - this is at most a single pass over the bins.

The OpenCV docs have several tutorials on how to build a histograms. Once you have the histogram, accumulate the bin values until you get pass 3840x2748/2. This bin is your median.

like image 161
Adi Shavit Avatar answered Nov 20 '22 19:11

Adi Shavit


OK.

I actually tried this before posting the question and due to some silly mistakes I disqualified it as a solution... anyway here it is:

I basically create a histogram of values for my original input with 2^12 = 4096 bins, compute the CDF and normalize it so it is mapped from 0 to 1 and find the smallest index in the CDF that is equal or larger than 0.5. I then divide this index by 12^2 and thus find the median value requested. Now it runs in 0.11 seconds (and that's in debug mode without heavy optimizations) which is less than half the time required in Matlab.

Here's the function (nVals = 4096 in my case corresponding with 12-bits of values):

double medianMat(cv::Mat Input, int nVals){

// COMPUTE HISTOGRAM OF SINGLE CHANNEL MATRIX
float range[] = { 0, nVals };
const float* histRange = { range };
bool uniform = true; bool accumulate = false;
cv::Mat hist;
calcHist(&Input, 1, 0, cv::Mat(), hist, 1, &nVals, &histRange, uniform, accumulate);

// COMPUTE CUMULATIVE DISTRIBUTION FUNCTION (CDF)
cv::Mat cdf;
hist.copyTo(cdf);
for (int i = 1; i <= nVals-1; i++){
    cdf.at<float>(i) += cdf.at<float>(i - 1);
}
cdf /= Input.total();

// COMPUTE MEDIAN
double medianVal;
for (int i = 0; i <= nVals-1; i++){
    if (cdf.at<float>(i) >= 0.5) { medianVal = i;  break; }
}
return medianVal/nVals; }
like image 27
CV_User Avatar answered Nov 20 '22 19:11

CV_User


It's probably faster to find it from the original data.

Since the original data has 12-bit values, there are only 4096 different possible values. That's a nice and small table! Go through all the data in one pass, and count how many of each value you have. That is a O(n) operation. Then it's easy to find the median, only count size/2 items from either end of the table.

like image 6
sp2danny Avatar answered Nov 20 '22 20:11

sp2danny