Can counting byte matches between two strings be optimized using SIMD?

Question

Profiling suggests that this function here is a real bottle neck for my application:

static inline int countEqualChars(const char* string1, const char* string2, int size) {
    int r = 0;
    for (int j = 0; j < size; ++j) {
        if (string1[j] == string2[j]) {
            ++r;
        }
    }

    return r;
}

Even with -O3 and -march=native, G++ 4.7.2 does not vectorize this function (I checked the assembler output). Now, I'm not an expert with SSE and friends, but I think that comparing more than one character at once should be faster. Any ideas on how to speed things up? Target architecture is x86-64.

Stephen Canon · Accepted Answer

Of course it can.

pcmpeqb compares two vectors of 16 bytes and produces a vector with zeros where they differed, and -1 where they match. Use this to compare 16 bytes at a time, adding the result to an accumulator vector (make sure to accumulate the results of at most 255 vector compares to avoid overflow). When you're done, there are 16 results in the accumulator. Sum them and negate to get the number of equal elements.

If the lengths are very short, it will be hard to get a significant speedup from this approach. If the lengths are long, then it will be worth pursuing.

Can counting byte matches between two strings be optimized using SIMD?

Tags:

c++

optimization

x86-64

simd

sse

Milan

1 Answers

Stephen Canon

Recent Activity

Donate For Us

Can counting byte matches between two strings be optimized using SIMD?

Tags:

c++

optimization

x86-64

simd

sse

Milan

1 Answers

Stephen Canon

Related questions

Recent Activity

Donate For Us