Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check XMM register for all zeroes

Is there a way to check if all bits/bytes/words etc. in a __m128i variable are 0?
In my app I have to check if all integers packed in a in a __m128i variable are zeroes. Will I have to extract them and compare each separately?


Edit:

What I am doing now is:

int next = 0;
do{
    //some code

    next = idata.m128i_i32[0] + idata.m128i_i32[1] + idata.m128i_i32[2] + idata.m128i_i32[3];
}while(next > 0);

What I need is to check if idata is all zeroes without having to access each individual element, and quit the loop if they are...


Based on Harold's comment this is the solution:

__m128i idata = _mm_setr_epi32(i,j,k,l);
do{
    //some code
}while( !_mm_testz_si128(idata, idata) );

This will exit the loop if all low bits of each DW in idata are 0... thanks harold!

like image 651
Daniel Gruszczyk Avatar asked Apr 16 '12 14:04

Daniel Gruszczyk


People also ask

How many xmm registers?

There are eight XMM registers available in non -64-bit modes and 16 XMM registers in long mode, which allow simultaneous operations on: 16 bytes.

What is XMM register?

XMM registers, instead, are a completely separate registers set, introduced with SSE and still widely used to this day. They are 128 bit wide, with instructions that can treat them as arrays of 64, 32 (integer and floating point),16 or 8 bit (integer only) values. You have 8 of them in 32 bit mode, 16 in 64 bit.

What are SSE registers?

SSE stands for Streaming SIMD Extensions. It is essentially the floating-point equivalent of the MMX instructions. The SSE registers are 128 bits, and can be used to perform operations on a variety of data sizes and types. Unlike MMX, the SSE registers do not overlap with the floating point stack.


2 Answers

_mm_testz_si128 is SSE4.1 which isn't supported on some CPUs (e.g. Intel Atom, AMD Phenom)

Here is an SSE2-compatible variant

inline bool isAllZeros(__m128i xmm) {
    return _mm_movemask_epi8(_mm_cmpeq_epi8(xmm, _mm_setzero_si128())) == 0xFFFF;
}
like image 128
Marat Dukhan Avatar answered Sep 27 '22 22:09

Marat Dukhan


Like Paul R commented to my original post:

"You don't need to initialise a dummy argument for the second parameter of PTEST, i.e. instead of _mm_testz_si128(idata, _mm_set1_epi32(0xFFFF)) you can just test a value against itself."

ptest does the entire job with one instruction.

This helped.

like image 32
Daniel Gruszczyk Avatar answered Sep 27 '22 23:09

Daniel Gruszczyk