I would like to ask if there is a quicker way to do my audio conversion than by iterating through all values one by one and dividing them through 32768.
void CAudioDataItem::Convert(const vector<int>&uIntegers, vector<double> &uDoubles)
{
for ( int i = 0; i <=uIntegers.size()-1;i++)
{
uDoubles[i] = uIntegers[i] / 32768.0;
}
}
My approach works fine, but it could be quicker. However I did not find any way to speed it up.
Thank you for the help!
If your array is large enough it may be worthwhile to parallelize this for loop. OpenMP's parallel for statement is what I would use.
The function would then be:
void CAudioDataItem::Convert(const vector<int>&uIntegers, vector<double> &uDoubles)
{
#pragma omp parallel for
for (int i = 0; i < uIntegers.size(); i++)
{
uDoubles[i] = uIntegers[i] / 32768.0;
}
}
with gcc you need to pass -fopenmp
when you compile for the pragma
to be used, on MSVC it is /openmp
. Since spawning threads has a noticeable overhead, this will only be faster if you are processing large arrays, YMMV.
For maximum speed you want to convert more than one value per loop iteration. The easiest way to do that is with SIMD. Here's roughly how you'd do it with SSE2:
void CAudioDataItem::Convert(const vector<int>&uIntegers, vector<double> &uDoubles)
{
__m128d scale = _mm_set_pd( 1.0 / 32768.0, 1.0 / 32768.0 );
int i = 0;
for ( ; i < uIntegers.size() - 3; i += 4)
{
__m128i x = _mm_loadu_si128(&uIntegers[i]);
__m128i y = _mm_shuffle_epi32(x, _MM_SHUFFLE(2,3,0,0) );
__m128d dx = _mm_cvtepi32_pd(x);
__m128d dy = _mm_cvtepi32_pd(y);
dx = _mm_mul_pd(dx, scale);
dy = _mm_mul_pd(dy, scale);
_mm_storeu_pd(dx, &uDoubles[i]);
_mm_storeu_pd(dy, &uDoubles[i + 2]);
}
// Finish off the last 0-3 elements the slow way
for ( ; i < uIntegers.size(); i ++)
{
uDoubles[i] = uIntegers[i] / 32768.0;
}
}
We process four integers per loop iteration. As we can only fit two doubles in the registers there's some duplicated work, but the extra unrolling will help performance unless the arrays are tiny.
Changing the data types to smaller ones (say short and float) should also help performance, because they cut down on memory bandwidth, and you can fit four floats in an SSE register. For audio data you shouldn't need the precision of a double.
Note that I've used unaligned loads and stores. Aligned ones will be slightly quicker if the data is actually aligned (which it won't be by default, and it's hard to make stuff aligned inside a std::vector).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With