Why is memcmp so much faster than a for loop check?

Tags:

Why is memcmp(a, b, size) so much faster than:

for(i = 0; i < nelements; i++) {     if a[i] != b[i] return 0; } return 1;

Is memcmp a CPU instruction or something? It must be pretty deep because I got a massive speedup using memcmp over the loop.

871

asked Jan 14 '14 05:01

jsj

1 Answers

memcmp is often implemented in assembly to take advantage of a number of architecture-specific features, which can make it much faster than a simple loop in C.

As a "builtin"

GCC supports memcmp (as well as a ton of other functions) as builtins. In some versions / configurations of GCC, a call to memcmp will be recognized as __builtin_memcmp. Instead of emitting a call to the memcmp library function, GCC will emit a handful of instructions to act as an optimized inline version of the function.

On x86, this leverages the use of the cmpsb instruction, which compares a string of bytes at one memory location to another. This is coupled with the repe prefix, so the strings are compared until they are no longer equal, or a count is exhausted. (Exactly what memcmp does).

Given the following code:

int test(const void* s1, const void* s2, int count) {     return memcmp(s1, s2, count) == 0; }

gcc version 3.4.4 on Cygwin generates the following assembly:

; (prologue) mov     esi, [ebp+arg_0]    ; Move first pointer to esi mov     edi, [ebp+arg_4]    ; Move second pointer to edi mov     ecx, [ebp+arg_8]    ; Move length to ecx  cld                         ; Clear DF, the direction flag, so comparisons happen                             ; at increasing addresses cmp     ecx, ecx            ; Special case: If length parameter to memcmp is                             ; zero, don't compare any bytes. repe cmpsb                  ; Compare bytes at DS:ESI and ES:EDI, setting flags                             ; Repeat this while equal ZF is set setz    al                  ; Set al (return value) to 1 if ZF is still set                             ; (all bytes were equal). ; (epilogue)

Reference:

cmpsb instruction

As a library function

Highly-optimized versions of memcmp exist in many C standard libraries. These will usually take advantage of architecture-specific instructions to work with lots of data in parallel.

In Glibc, there are versions of memcmp for x86_64 that can take advantage of the following instruction set extensions:

SSE2 - sysdeps/x86_64/memcmp.S
SSE4 - sysdeps/x86_64/multiarch/memcmp-sse4.S
SSSE3 - sysdeps/x86_64/multiarch/memcmp-ssse3.S

The cool part is that glibc will detect (at run-time) the newest instruction set your CPU has, and execute the version optimized for it. See this snippet from sysdeps/x86_64/multiarch/memcmp.S:

ENTRY(memcmp)     .type   memcmp, @gnu_indirect_function     LOAD_RTLD_GLOBAL_RO_RDX     HAS_CPU_FEATURE (SSSE3)     jnz 2f     leaq    __memcmp_sse2(%rip), %rax     ret   2:  HAS_CPU_FEATURE (SSE4_1)     jz  3f       leaq    __memcmp_sse4_1(%rip), %rax     ret   3:  leaq    __memcmp_ssse3(%rip), %rax     ret   END(memcmp)

In the Linux kernel

Linux does not seem to have an optimized version of memcmp for x86_64, but it does for memcpy, in arch/x86/lib/memcpy_64.S. Note that is uses the alternatives infrastructure (arch/x86/kernel/alternative.c) for not only deciding at runtime which version to use, but actually patching itself to only make this decision once at boot-up.

129

answered Sep 17 '22 14:09

Jonathon Reinhart

Related questions
                            
                                Unexpected optimization of strlen when aliasing 2-d array
                            
                                Secure this invaluable documentation on using C/C++ with GSSAPI and SASL
                            
                                Raw H264 frames in mpegts container using libavcodec
                            
                                How to determine which compiler has been used to compile an executable?
                            
                                Signal number to name?
                            
                                Signedness of enum in C/C99/C++/C++x/GNU C/GNU C99
                            
                                Segmentation fault when popping x86 stack
                            
                                What is a parameter forward declaration?
                            
                                dlopen from memory?
                            
                                Is it well-defined to hold a misaligned pointer, as long as you don't ever dereference it?
                            
                                What does it mean by "#define X X"?
                            
                                What is a good unix alternative to DDD (Data Display Debugger)? [closed]
                            
                                What was the rationale for making `return 0` at the end of `main` optional?
                            
                                C++ Equivalent to Designated Initializers?
                            
                                Is there a difference between the "-Wl,option" and "-Xlinker option" syntax for GCC?
                            
                                Which C99 features are available in the MS Visual Studio compiler?
                            
                                How does pointer comparison work in C? Is it ok to compare pointers that don't point to the same array?
                            
                                When do we need #ifdef before #undef?
                            
                                Multithreaded Memory Allocators for C/C++
                            
                                Does guarding a variable with a pthread mutex guarantee it's also not cached?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is memcmp so much faster than a for loop check?

Tags:

performance

c

optimization

memcmp

jsj

People also ask