Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is strcmp not SIMD optimized?

I've tried to compile this program on an x64 computer:

#include <cstring>  int main(int argc, char* argv[]) {   return ::std::strcmp(argv[0],     "really really really really really really really really really"     "really really really really really really really really really"     "really really really really really really really really really"     "really really really really really really really really really"     "really really really really really really really really really"     "really really really really really really really really really"     "really really really really really really really really really"     "really really really really really really really really really"     "really really really really really really really long string"   ); } 

I compiled it like this:

g++ -std=c++11 -msse2 -O3 -g a.cpp -o a 

But the resulting disassembly is like this:

   0x0000000000400480 <+0>:     mov    (%rsi),%rsi    0x0000000000400483 <+3>:     mov    $0x400628,%edi    0x0000000000400488 <+8>:     mov    $0x22d,%ecx    0x000000000040048d <+13>:    repz cmpsb %es:(%rdi),%ds:(%rsi)    0x000000000040048f <+15>:    seta   %al    0x0000000000400492 <+18>:    setb   %dl    0x0000000000400495 <+21>:    sub    %edx,%eax    0x0000000000400497 <+23>:    movsbl %al,%eax    0x000000000040049a <+26>:    retq  

Why is no SIMD used? I suppose it could be to compare, say, 16 chars at once. Should I write my own SIMD strcmp, or is it a nonsensical idea for some reason?

like image 725
user1095108 Avatar asked Oct 27 '14 10:10

user1095108


1 Answers

In a SSE2 implementation, how should the compiler make sure that no memory accesses happen over the end of the string? It has to know the length first and this requires scanning the string for the terminating zero byte.

If you scan for the length of the string you have already accomplished most of the work of a strcmp function. Therefore there is no benefit to use SSE2.

However, Intel added instructions for string handling in the SSE4.2 instruction set. These handle the terminating zero byte problem. For a nice write-up on them read this blog-post:

http://www.strchr.com/strcmp_and_strlen_using_sse_4.2

like image 69
Nils Pipenbrinck Avatar answered Sep 28 '22 02:09

Nils Pipenbrinck