Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Methods to vectorise histogram in SIMD?

I am trying to implement histogram in Neon. Is it possible to vectorise ?

like image 609
Rugger Avatar asked Oct 20 '12 06:10

Rugger


2 Answers

Histogramming is almost impossible to vectorize, unfortunately.

You can probably optimise the scalar code somewhat however - a common trick is to use two histograms and then combine them at the end. This allows you to overlap loads/increments/stores and thereby bury some of the serial dependencies and associated latencies. Pseudo code:

init histogram 1 to all 0s
init histogram 2 to all 0s
loop
  get input value 1
  get input value 2
  load count for value 1 from histogram 1
  load count for value 2 from histogram 2
  increment count for histogram 1
  increment count for histogram 2
  store count for value 1 to histogram 1
  store count for value 2 to histogram 2
until done
combine histogram 1 and histogram 2
like image 105
Paul R Avatar answered Sep 28 '22 12:09

Paul R


ermig1979 has a Simd project which shows how he has done histograms using a similar approach to what @Paul-R has mentioned but also with SSE2 and AVX2 variants:

Project: https://github.com/ermig1979/Simd

Base file: https://github.com/ermig1979/Simd/blob/master/src/Simd/SimdBaseHistogram.cpp

An AVX2 implementation can be seen here: https://github.com/ermig1979/Simd/blob/master/src/Simd/SimdAvx2Histogram.cpp

A scalar solution can be seen below to illustrate the basic principle of creating multiple histograms that are summed at the end:

void Histogram(const uint8_t * src, size_t width, size_t height, size_t stride, 
    uint32_t * histogram)
{
    uint32_t histograms[4][HISTOGRAM_SIZE];
    memset(histograms, 0, sizeof(uint32_t)*HISTOGRAM_SIZE*4);
    size_t alignedWidth = Simd::AlignLo(width, 4);
    for(size_t row = 0; row < height; ++row)
    {
        size_t col = 0;
        for(; col < alignedWidth; col += 4)
        {
            ++histograms[0][src[col + 0]];
            ++histograms[1][src[col + 1]];
            ++histograms[2][src[col + 2]];
            ++histograms[3][src[col + 3]];
        }
        for(; col < width; ++col)
            ++histograms[0][src[col + 0]];

        src += stride;
    }

    for(size_t i = 0; i < HISTOGRAM_SIZE; ++i)
        histogram[i] = histograms[0][i] + histograms[1][i] + 
            histograms[2][i] + histograms[3][i];
}
like image 36
nietras Avatar answered Sep 28 '22 11:09

nietras