SIMD/SSE: How to check that all vector elements are non-zero

Question

I need to check that all vector elements are non-zero. So far I found following solution. Is there a better way to do this? I am using gcc 4.8.2 on Linux/x86_64, instructions up to SSE4.2.

typedef char ChrVect __attribute__((vector_size(16), aligned(16)));

inline bool testNonzero(ChrVect vect)
{
    const ChrVect vzero = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
    return (0 == (__int128_t)(vzero == vect));
}

Update: code above is compiled to following assembler code (when compiled as non-inline function):

movdqa  %xmm0, -24(%rsp)
pxor    %xmm0, %xmm0
pcmpeqb -24(%rsp), %xmm0
movdqa  %xmm0, -24(%rsp)
movq    -24(%rsp), %rax
orq -16(%rsp), %rax
sete    %al
ret

Paul R · Accepted Answer

With straight SSE intrinsics you might do it like this:

inline bool testNonzero(__m128i v)
{
    __m128i vcmp = _mm_cmpeq_epi8(v, _mm_setzero_si128());
#if __SSE4_1__  // for SSE 4.1 and later use PTEST
    return _mm_testz_si128(vcmp, vcmp);
#else           // for older SSE use PMOVMSKB
    uint32_t mask = _mm_movemask_epi8(vcmp);
    return (mask == 0);
#endif
}

I suggest looking at what your compiler currently generates for your existing code and then compare it with this version using intrinsics and see if there is any significant difference.

With SSE3 (clang -O3 -msse3) I get the following for the above function:

pxor    %xmm1, %xmm1
pcmpeqb %xmm1, %xmm0
pmovmskb    %xmm0, %ecx
testl   %ecx, %ecx

The SSE4 version (clang -O3 -msse4.1) produces:

pxor    %xmm1, %xmm1
pcmpeqb %xmm1, %xmm0
ptest   %xmm0, %xmm0

Note that the zeroing of xmm1 will typically be hoisted out of any loop containing this function, so the above sequences should be reduced by one instruction when used inside a loop.

SIMD/SSE: How to check that all vector elements are non-zero

Tags:

c++

c

vectorization

gcc

simd

Daniel Frużyński

1 Answers

Paul R

Recent Activity

Donate For Us

SIMD/SSE: How to check that all vector elements are non-zero

Tags:

c++

c

vectorization

gcc

simd

Daniel Frużyński

1 Answers

Paul R

Related questions

Recent Activity

Donate For Us