Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

128bit hash comparison with SSE


In my current project, I have to compare 128bit values (actually md5 hashes) and I thought it would be possible to accelerate the comparison by using SSE instructions. My problem is that I can't manage to find good documentation on SSE instructions; I'm searching for a 128bit integer comparison instruction that let me know if one hash is larger, smaller or equal to another. Does such an instruction exists?

PS: The targeted machines are x86_64 servers with SSE2 instructions; I'm also interested in a NEON instruction for the same job.

like image 286
fokenrute Avatar asked Dec 26 '10 14:12

fokenrute


3 Answers

There are no 128-bit integer comparison instructions in the SSE or NEON instruction sets.

SSE4.1 added vector 64-bit integer comparisons: PCMPEQQ and PCMPGTQ, but because of the way they are implemented it is not straightforward to piece two of them together into a 128-bit comparison.

The preferred way to accomplish a 128-bit comparison on x86_64 is to use a 64-bit comparison on the high word, then an additional 64-bit comparison on the low word only if the high words compare equal:

    cmp {ahi}, {bhi}
    jne  0f
    cmp {alo}, {blo}
0:  // flags are now set as though a comparison of unsigned 128-bit values
    // was performed; signed comparisons are a bit different.

On ARM, the usual idiom is a sequence of conditional comparisons word-by-word to set the flags as needed.

like image 88
Stephen Canon Avatar answered Nov 14 '22 20:11

Stephen Canon


Actually the 128 bit comparison of two values a and b is possible using SSE 4.1 with two instructions and a spare register set to zero before.

In x86 assembly, using legacy 128 bit SSE:

    pxor    %xmm2, %xmm2     # set xmm2 to zero. Should be moved out of the loop.

    # compare %xmm0 to %xmm1 for equality
    pxor    %xmm0, %xmm1     # xmm1 is zero if both operands are equal
    ptest   %xmm2, %xmm1     # test not(xmm2) and xmm1. If any bit in xmm1 is set
    jc      equal            # the carry flag is cleared.
not_equal:
    ...        
equal:

Using intrinsics in C is preferred, as they will automatically benefit from AVX 3 operands syntax, which does actually save a considerable amount of SSE register moves.

static const __m128i zero = {0};

inline bool compare128(__m128i a, __m128i b) {
    __m128i c = _mm_xor_si128(a, b);
    return _mm_testc_si128(zero, c);
}

This compiles to something similar as above, especially the bool temporary gets folded and the carry flag is used directly.

like image 45
Gunther Piez Avatar answered Nov 14 '22 22:11

Gunther Piez


PCMPGT will not compare the entire 128 bits, it will always work with smaller units and produce separate results. Additionally it works on signed values, which complicates things further.

If you are running in 64 bit mode, I think it would be fastest to use two native 64 bit subtracts or compares.

Not sure why you can't find the documentation, it is all in the intel instruction set reference.

like image 2
Jester Avatar answered Nov 14 '22 22:11

Jester