Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this version of strcmp slower?

Tags:

I have been trying experiment with improving performance of strcmp under certain conditions. However, I unfortunately cannot even get an implementation of plain vanilla strcmp to perform as well as the library implementation.

I saw a similar question, but the answers say the difference was from the compiler optimizing away the comparison on string literals. My test does not use string literals.

Here's the implementation (comparisons.cpp)

int strcmp_custom(const char* a, const char* b) {     while (*b == *a) {         if (*a == '\0') return 0;         a++;         b++;     }     return *b - *a; } 

And here's the test driver (driver.cpp):

#include "comparisons.h"  #include <array> #include <chrono> #include <iostream>  void init_string(char* str, int nChars) {     // 10% of strings will be equal, and 90% of strings will have one char different.     // This way, many strings will share long prefixes so strcmp has to exercise a bit.     // Using random strings still shows the custom implementation as slower (just less so).     str[nChars - 1] = '\0';     for (int i = 0; i < nChars - 1; i++)         str[i] = (i % 94) + 32;      if (rand() % 10 != 0)         str[rand() % (nChars - 1)] = 'x'; }  int main(int argc, char** argv) {     srand(1234);      // Pre-generate some strings to compare.     const int kSampleSize = 100;     std::array<char[1024], kSampleSize> strings;     for (int i = 0; i < kSampleSize; i++)         init_string(strings[i], kSampleSize);      auto start = std::chrono::high_resolution_clock::now();      for (int i = 0; i < kSampleSize; i++)         for (int j = 0; j < kSampleSize; j++)             strcmp(strings[i], strings[j]);      auto end = std::chrono::high_resolution_clock::now();     std::cout << "strcmp        - " << (end - start).count() << std::endl;      start = std::chrono::high_resolution_clock::now();      for (int i = 0; i < kSampleSize; i++)         for (int j = 0; j < kSampleSize; j++)             strcmp_custom(strings[i], strings[j]);      end = std::chrono::high_resolution_clock::now();     std::cout << "strcmp_custom - " << (end - start).count() << std::endl; } 

And my makefile:

CC=clang++  test: driver.o comparisons.o     $(CC) -o test driver.o comparisons.o  # Compile the test driver with optimizations off. driver.o: driver.cpp comparisons.h     $(CC) -c -o driver.o -std=c++11 -O0 driver.cpp  # Compile the code being tested separately with optimizations on. comparisons.o: comparisons.cpp comparisons.h     $(CC) -c -o comparisons.o -std=c++11 -O3 comparisons.cpp  clean:     rm comparisons.o driver.o test 

On the advice of this answer, I compiled my comparison function in a separate compilation unit with optimizations and compiled the driver with optimizations turned off, but I still get a slowdown of about 5x.

strcmp        - 154519 strcmp_custom - 506282 

I also tried copying the FreeBSD implementation but got similar results.

I'm wondering if my performance measurement is overlooking something. Or is the standard library implementation doing something fancier?

like image 630
kevinAlbs Avatar asked Jul 02 '17 19:07

kevinAlbs


People also ask

Is Strncmp faster than Strcmp?

If n is sufficiently large that strncmp will compare the whole strings (so that the behavior becomes effectively the same as strcmp ) then strncmp is likely to be moderately slower because it also has to keep track of a counter, but the difference might or might not be measurable or even present in a given ...

Is Strcmp faster?

While comparing the two function, in a loop repeted 500'000'000 times, strcmp execute too much fast, about x750 times faster. The execution time of that function is 3.058s , while strcmp only 0.004s .


1 Answers

I don't know which standard library you have, but just to give you an idea of how serious C library maintainers are about optimizing the string primitives, the default strcmp used by GNU libc on x86-64 is two thousand lines of hand-optimized assembly language, as of version 2.24. There are separate, also hand-optimized, versions for when the SSSE3 and SSE4.2 instruction set extensions are available. (A fair bit of the complexity in that file appears to be because the same source code is used to generate several other functions; the machine code winds up being "only" 1120 instructions.) 2.24 was released roughly a year ago, and even more work has gone into it since.

They go to this much trouble because it's common for one of the string primitives to be the single hottest function in a profile.

like image 54
zwol Avatar answered Oct 07 '22 19:10

zwol