Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ conversion optimization

I would like to ask if there is a quicker way to do my audio conversion than by iterating through all values one by one and dividing them through 32768.

void CAudioDataItem::Convert(const vector<int>&uIntegers, vector<double> &uDoubles)
{
    for ( int i = 0; i <=uIntegers.size()-1;i++)
    {
        uDoubles[i] = uIntegers[i] / 32768.0;
    }
}

My approach works fine, but it could be quicker. However I did not find any way to speed it up.

Thank you for the help!

like image 242
tmighty Avatar asked Jun 24 '13 18:06

tmighty


2 Answers

If your array is large enough it may be worthwhile to parallelize this for loop. OpenMP's parallel for statement is what I would use.
The function would then be:

    void CAudioDataItem::Convert(const vector<int>&uIntegers, vector<double> &uDoubles)
    {
        #pragma omp parallel for
        for (int i = 0; i < uIntegers.size(); i++)
        {
            uDoubles[i] = uIntegers[i] / 32768.0;
        }
    }

with gcc you need to pass -fopenmp when you compile for the pragma to be used, on MSVC it is /openmp. Since spawning threads has a noticeable overhead, this will only be faster if you are processing large arrays, YMMV.

like image 83
SirGuy Avatar answered Sep 29 '22 18:09

SirGuy


For maximum speed you want to convert more than one value per loop iteration. The easiest way to do that is with SIMD. Here's roughly how you'd do it with SSE2:

void CAudioDataItem::Convert(const vector<int>&uIntegers, vector<double> &uDoubles)
{
    __m128d scale = _mm_set_pd( 1.0 / 32768.0, 1.0 / 32768.0 );
    int i = 0;
    for ( ; i < uIntegers.size() - 3; i += 4)
    {
        __m128i x = _mm_loadu_si128(&uIntegers[i]);
        __m128i y = _mm_shuffle_epi32(x, _MM_SHUFFLE(2,3,0,0) );
        __m128d dx = _mm_cvtepi32_pd(x);
        __m128d dy = _mm_cvtepi32_pd(y);
        dx = _mm_mul_pd(dx, scale);
        dy = _mm_mul_pd(dy, scale);
        _mm_storeu_pd(dx, &uDoubles[i]);
        _mm_storeu_pd(dy, &uDoubles[i + 2]);
    }
    // Finish off the last 0-3 elements the slow way
    for ( ; i < uIntegers.size(); i ++)
    {
        uDoubles[i] = uIntegers[i] / 32768.0;
    }
}

We process four integers per loop iteration. As we can only fit two doubles in the registers there's some duplicated work, but the extra unrolling will help performance unless the arrays are tiny.

Changing the data types to smaller ones (say short and float) should also help performance, because they cut down on memory bandwidth, and you can fit four floats in an SSE register. For audio data you shouldn't need the precision of a double.

Note that I've used unaligned loads and stores. Aligned ones will be slightly quicker if the data is actually aligned (which it won't be by default, and it's hard to make stuff aligned inside a std::vector).

like image 22
Adam Avatar answered Sep 29 '22 20:09

Adam