I have been trying experiment with improving performance of strcmp
under certain conditions. However, I unfortunately cannot even get an implementation of plain vanilla strcmp
to perform as well as the library implementation.
I saw a similar question, but the answers say the difference was from the compiler optimizing away the comparison on string literals. My test does not use string literals.
Here's the implementation (comparisons.cpp)
int strcmp_custom(const char* a, const char* b) { while (*b == *a) { if (*a == '\0') return 0; a++; b++; } return *b - *a; }
And here's the test driver (driver.cpp):
#include "comparisons.h" #include <array> #include <chrono> #include <iostream> void init_string(char* str, int nChars) { // 10% of strings will be equal, and 90% of strings will have one char different. // This way, many strings will share long prefixes so strcmp has to exercise a bit. // Using random strings still shows the custom implementation as slower (just less so). str[nChars - 1] = '\0'; for (int i = 0; i < nChars - 1; i++) str[i] = (i % 94) + 32; if (rand() % 10 != 0) str[rand() % (nChars - 1)] = 'x'; } int main(int argc, char** argv) { srand(1234); // Pre-generate some strings to compare. const int kSampleSize = 100; std::array<char[1024], kSampleSize> strings; for (int i = 0; i < kSampleSize; i++) init_string(strings[i], kSampleSize); auto start = std::chrono::high_resolution_clock::now(); for (int i = 0; i < kSampleSize; i++) for (int j = 0; j < kSampleSize; j++) strcmp(strings[i], strings[j]); auto end = std::chrono::high_resolution_clock::now(); std::cout << "strcmp - " << (end - start).count() << std::endl; start = std::chrono::high_resolution_clock::now(); for (int i = 0; i < kSampleSize; i++) for (int j = 0; j < kSampleSize; j++) strcmp_custom(strings[i], strings[j]); end = std::chrono::high_resolution_clock::now(); std::cout << "strcmp_custom - " << (end - start).count() << std::endl; }
And my makefile:
CC=clang++ test: driver.o comparisons.o $(CC) -o test driver.o comparisons.o # Compile the test driver with optimizations off. driver.o: driver.cpp comparisons.h $(CC) -c -o driver.o -std=c++11 -O0 driver.cpp # Compile the code being tested separately with optimizations on. comparisons.o: comparisons.cpp comparisons.h $(CC) -c -o comparisons.o -std=c++11 -O3 comparisons.cpp clean: rm comparisons.o driver.o test
On the advice of this answer, I compiled my comparison function in a separate compilation unit with optimizations and compiled the driver with optimizations turned off, but I still get a slowdown of about 5x.
strcmp - 154519 strcmp_custom - 506282
I also tried copying the FreeBSD implementation but got similar results.
I'm wondering if my performance measurement is overlooking something. Or is the standard library implementation doing something fancier?
If n is sufficiently large that strncmp will compare the whole strings (so that the behavior becomes effectively the same as strcmp ) then strncmp is likely to be moderately slower because it also has to keep track of a counter, but the difference might or might not be measurable or even present in a given ...
While comparing the two function, in a loop repeted 500'000'000 times, strcmp execute too much fast, about x750 times faster. The execution time of that function is 3.058s , while strcmp only 0.004s .
I don't know which standard library you have, but just to give you an idea of how serious C library maintainers are about optimizing the string primitives, the default strcmp
used by GNU libc on x86-64 is two thousand lines of hand-optimized assembly language, as of version 2.24. There are separate, also hand-optimized, versions for when the SSSE3 and SSE4.2 instruction set extensions are available. (A fair bit of the complexity in that file appears to be because the same source code is used to generate several other functions; the machine code winds up being "only" 1120 instructions.) 2.24 was released roughly a year ago, and even more work has gone into it since.
They go to this much trouble because it's common for one of the string primitives to be the single hottest function in a profile.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With