Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SIMD/SSE: How to check that all vector elements are non-zero

I need to check that all vector elements are non-zero. So far I found following solution. Is there a better way to do this? I am using gcc 4.8.2 on Linux/x86_64, instructions up to SSE4.2.

typedef char ChrVect __attribute__((vector_size(16), aligned(16)));

inline bool testNonzero(ChrVect vect)
{
    const ChrVect vzero = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
    return (0 == (__int128_t)(vzero == vect));
}

Update: code above is compiled to following assembler code (when compiled as non-inline function):

movdqa  %xmm0, -24(%rsp)
pxor    %xmm0, %xmm0
pcmpeqb -24(%rsp), %xmm0
movdqa  %xmm0, -24(%rsp)
movq    -24(%rsp), %rax
orq -16(%rsp), %rax
sete    %al
ret
like image 766
Daniel Frużyński Avatar asked Dec 08 '15 12:12

Daniel Frużyński


1 Answers

With straight SSE intrinsics you might do it like this:

inline bool testNonzero(__m128i v)
{
    __m128i vcmp = _mm_cmpeq_epi8(v, _mm_setzero_si128());
#if __SSE4_1__  // for SSE 4.1 and later use PTEST
    return _mm_testz_si128(vcmp, vcmp);
#else           // for older SSE use PMOVMSKB
    uint32_t mask = _mm_movemask_epi8(vcmp);
    return (mask == 0);
#endif
}

I suggest looking at what your compiler currently generates for your existing code and then compare it with this version using intrinsics and see if there is any significant difference.

With SSE3 (clang -O3 -msse3) I get the following for the above function:

pxor    %xmm1, %xmm1
pcmpeqb %xmm1, %xmm0
pmovmskb    %xmm0, %ecx
testl   %ecx, %ecx

The SSE4 version (clang -O3 -msse4.1) produces:

pxor    %xmm1, %xmm1
pcmpeqb %xmm1, %xmm0
ptest   %xmm0, %xmm0

Note that the zeroing of xmm1 will typically be hoisted out of any loop containing this function, so the above sequences should be reduced by one instruction when used inside a loop.

like image 76
Paul R Avatar answered Oct 30 '22 23:10

Paul R