Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to divide a __m256i vector by an integer variable?

I want to divide an AVX2 vector by a constant. I visited this question and many other pages. Saw something that might help Fixed-point arithmetic and I didn't understand. So the problem is this division is the bottleneck. I tried two ways:

First, casting to float and do the operation with AVX instruction:

//outside the bottleneck:
__m256i veci16; // containing some integer numbers (16x16-bit numbers)
__m256 div_v = _mm256_set1_ps(div);

//inside the bottlneck
//some calculations which make veci16
vecps = _mm256_castsi256_ps (veci16);
vecps = _mm256_div_ps (vecps, div_v);
veci16 = _mm256_castps_si256 (vecps);
_mm256_storeu_si256((__m256i *)&output[i][j], veci16);

With the first method, the problem is: without division elapsed time is 5ns and with this elapsed time is about 60ns.

Second, I stored to an array and loaded it like this:

int t[16] ;
inline __m256i _mm256_div_epi16 (__m256i a , int b){

    _mm256_store_si256((__m256i *)&t[0] , a);
    t[0]/=b; t[1]/=b; t[2]/=b; t[3]/=b; t[4]/=b; t[5]/=b; t[6]/=b; t[7]/=b;
    t[8]/=b; t[9]/=b; t[10]/=b; t[11]/=b; t[12]/=b; t[13]/=b; t[14]/=b; t[15]/=b;
    return _mm256_load_si256((__m256i *)&t[0]);         
}

Well, it was better. But still elapsed time is 17ns. Calculations are too much to show here.

The question is: Is there any faster way to optimize this inline function?

like image 867
Hossein Amiri Avatar asked Feb 24 '17 15:02

Hossein Amiri


1 Answers

You can do this with _mm256_mulhrs_epi16. This does a fixed-point multiply, so you just set the multiplicand vector to 32768 / b:

inline __m256i _mm256_div_epi16 (const __m256i va, const int b)
{
    __m256i vb = _mm256_set1_epi16(32768 / b);
    return _mm256_mulhrs_epi16(va, vb);
}

Note that this assumes b > 1.

like image 131
Paul R Avatar answered Sep 21 '22 23:09

Paul R