Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How could I speed up comparison of std::string against string literals?

I have a bunch of code where objects of type std::string are compared for equality against string literals. Something like this:

//const std:string someString = //blahblahblah;
if( someString == "(" ) {
   //do something
} else if( someString == ")" ) {
   //do something else
} else if// this chain can be very long

The comparison time accumulates to a serious amount (yes, I profiled) and so it'd be nice to speed it up.

The code compares the string against numerous short string literals and this comparison can hardly be avoided. Leaving the string declared as std::string is most likely inevitable - there're thousands lines of code like that. Leaving string literals and comparison with == is also likely inevitable - rewriting the whole code would be a pain.

The problem is the STL implementation that comes with Visual C++11 uses somewhat strange approach. == is mapped onto std::operator==(const basic_string&, const char*) which calls basic_string::compare( const char* ) which in turn calls std::char_traits<char>( const char* ) which calls strlen() to compute the length of the string literal. Then the comparison runs for the two strings and lengths of both strings are passed into that comparison.

The compiler has a hard time analyzing all this and emits code that traverses the string literal twice. With short literals that's not much time but every comparison involves traversing the literal twice instead of once. Simply calling strcmp() would most likely be faster.

Is there anything I could do like perhaps writing a custom comparator class that would help avoid traversing the string literals twice in this scenario?

like image 926
sharptooth Avatar asked May 26 '14 14:05

sharptooth


People also ask

What is the best way to compare strings?

The right way of comparing String in Java is to either use equals(), equalsIgnoreCase(), or compareTo() method. You should use equals() method to check if two String contains exactly same characters in same order. It returns true if two String are equal or false if unequal.

How do I compare strings to other strings in C++?

In order to compare two strings, we can use String's strcmp() function. The strcmp() function is a C library function used to compare two strings in a lexicographical manner. The function returns 0 if both the strings are equal or the same. The input string has to be a char array of C-style string.

How do you compare string literals?

To compare string literals, still use the equality and relational operators, but for objects of the string class, and not for const char*s. Using the operators for const char*s compares the pointers, and not the string literals.

Are string comparisons slow?

Comparing two strings is very slow and expensive. Most algorithms require iterating through entire string and matching each character.


2 Answers

Similar to Dietmar's solution, but with slightly less editing: you can wrap the string (once) instead of each literal

#include <string>
#include <cstring>
struct FastLiteralWrapper {
    std::string const &s;

    explicit FastLiteralWrapper(std::string const &s_) : s(s_) {}

    template <std::size_t ArrayLength>
    bool operator== (char const (&other)[ArrayLength]) {
        std::size_t const StringLength = ArrayLength - 1;
        return StringLength == s.size()
            && std::memcmp(s.data(), other, StringLength) == 0;
    }
};

and your code becomes:

const std:string someStdString = "blahblahblah";
// just for the context of the comparison:
FastLiteralWrapper someString(someStdString);
if( someString == "(" ) {
   //do something
} else if( someString == ")" ) {
   //do something else
} else if// this chain can be very long

NB. the fastest solution - at the cost of more editing - is probably to build a (perfect) hash or trie mapping string literals to enumerated constants, and then just switch on the looked-up value. Long if/else if chains usually smell bad IMO.

like image 177
Useless Avatar answered Oct 14 '22 23:10

Useless


Well, aside from C++14's string_literal, you could easily code up a solution:

  1. For comparison with a single character, use a character literal and:

    bool operator==(const std::string& s, char c)
    {
      return s.size() == 1 && s[0] == c;
    }
    
  2. For comparison with a string literal, you can use something like this:

    template<std::size_t N>
    bool operator==(const std::string& s, char const (&literal)[N])
    {
      return s.size() == N && std::memcmp(s.data(), literal, N-1) == 0;
    }
    

Disclaimer:

  • The first might even be superfluous,
  • Only do this if you measure an improvement over what you had.
like image 27
rubenvb Avatar answered Oct 14 '22 23:10

rubenvb