Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why '==' is slow on std::string?

While profiling my application I realized that a lot of time is spent on string comparisons. So I wrote a simple benchmark and I was surprised that '==' is much slower than string::compare and strcmp! here is the code, can anyone explain why is that? or what's wrong with my code? because according to the standard '==' is just an operator overload and simply returnes !lhs.compare(rhs).

#include <iostream>
#include <vector>
#include <string>
#include <stdint.h>
#include "Timer.h"
#include <random>
#include <time.h>
#include <string.h>
using namespace std;
uint64_t itr  = 10000000000;//10 Billion
int len = 100;
int main() {
  srand(time(0));
  string s1(len,random()%128);
  string s2(len,random()%128);

uint64_t a = 0;
  Timer t;
  t.begin();
  for(uint64_t i =0;i<itr;i++){
    if(s1 == s2)
      a = i;
  }
  t.end();

  cout<<"==       took:"<<t.elapsedMillis()<<endl;

  t.begin();
  for(uint64_t i =0;i<itr;i++){
    if(s1.compare(s2)==0)
      a = i;
  }
  t.end();

  cout<<".compare took:"<<t.elapsedMillis()<<endl;

  t.begin();
  for(uint64_t i =0;i<itr;i++){
    if(strcmp(s1.c_str(),s2.c_str()))
      a = i;
  }
  t.end();

  cout<<"strcmp   took:"<<t.elapsedMillis()<<endl;

  return a;
}

And here is the result:

==       took:5986.74
.compare took:0.000349
strcmp   took:0.000778

And my compile flags:

CXXFLAGS = -O3 -Wall -fmessage-length=0 -std=c++1y

I use gcc 4.9 on a x86_64 linux machine.

Obviously using -o3 does some optimizations which I guess rolls out the last two loops totally; however, using -o2 still the results are weird:

for 1 billion iterations:

==       took:19591
.compare took:8318.01
strcmp   took:6480.35

P.S. Timer is just a wrapper class to measure spent time; I am absolutely sure about it :D

Code for Timer class:

#include <chrono>

#ifndef SRC_TIMER_H_
#define SRC_TIMER_H_


class Timer {
  std::chrono::steady_clock::time_point start;
  std::chrono::steady_clock::time_point stop;
public:
  Timer(){
    start = std::chrono::steady_clock::now();
    stop = std::chrono::steady_clock::now();
  }
  virtual ~Timer() {}

  inline void begin() {
    start = std::chrono::steady_clock::now();
  }

  inline void end() {
    stop = std::chrono::steady_clock::now();
  }

  inline double elapsedMillis() {
    auto diff = stop - start;
    return  std::chrono::duration<double, std::milli> (diff).count();
  }

  inline double elapsedMicro() {
    auto diff = stop - start;
    return  std::chrono::duration<double, std::micro> (diff).count();
  }

  inline double elapsedNano() {
    auto diff = stop - start;
    return  std::chrono::duration<double, std::nano> (diff).count();
  }

  inline double elapsedSec() {
    auto diff = stop - start;
    return std::chrono::duration<double> (diff).count();
  }
};

#endif /* SRC_TIMER_H_ */
like image 555
Behrooz Avatar asked Feb 26 '15 04:02

Behrooz


People also ask

Are STD strings slow?

Experiments have revelead that method find from C++ std::string is incredibly slow. The method can be slower an order of magnitude than strstr, and it doesn't get better with a newer stdlib — GCC versions varies from 4.9. 2 (Debian) to 5.4. 0 (Ubuntu).

What does std::string () do?

std::string class in C++ C++ has in its definition a way to represent a sequence of characters as an object of the class. This class is called std:: string. String class stores the characters as a sequence of bytes with the functionality of allowing access to the single-byte character.

Does std::string allocate?

However, std::string requires space to be dynamically allocated in the buffer and more memory to be dynamically allocated each time an operation is performed on the string.

What is a difference between the std::string and C style strings?

std::string is compatible with STL algorithms and other containers. C strings are not char * or const char * ; they are just null-terminated character arrays. Even string literals are just character arrays.


1 Answers

UPDATE: output of improved benchmark at http://ideone.com/rGc36a

==       took:21
.compare took:21
strcmp   took:14
==       took:21
.compare took:25
strcmp   took:14

The thing that proved crucial to get it working meaningfully was "outwitting" the compiler's ability to predict the strings being compared at compile time:

// more strings that might be used...
string s[] = { {len,argc+'A'}, {len,argc+'A'}, {len, argc+'B'}, {len, argc+'B'} };

if(s[i&3].compare(s[(i+1)&3])==0)  // trickier to optimise
  a += i;  // cumulative observable side effects

Note that in general, strcmp is not functionally equivalent to == or .compare when the text may embed NULs, as the former will get to "exit early". (That's not the reason it's "faster" above, but do read below for comments re possible variations with string length/content etc..)


Discussion / Earlier answer

Just have a look at your implementation - e.g.

echo '#include <string>' > stringE.cc
g++ -E stringE.cc | less

Search for the basic_string template, then for the operator== working on two string instances - mine is:

template<class _Elem,
    class _Traits,
    class _Alloc> inline
    bool __cdecl operator==(
            const basic_string<_Elem, _Traits, _Alloc>& _Left,
            const basic_string<_Elem, _Traits, _Alloc>& _Right)
    {
    return (_Left.compare(_Right) == 0);
    }

Notice that operator== is inline and simply calls compare. There's no way it's consistently significantly slower with normal optimisation levels enabled, though the optimiser might occasionally happen to optimise one loop better than another due to subtle side effects of surrounding code.

Your ostensible problem will have been caused by e.g. your code being optimised beyond the point of doing the intended work, for loops arbitrarily unrolled to different degrees, or other quirks or bugs in the optimisation or your timing. That's not unusual when you have unvarying inputs and loops that don't have any cumulative side-effects (i.e. the compiler can work out that intermediate values of a are not used, so only the last a = i need take affect).

So, learn to write better benchmarks. In this case, that's a bit tricky as having lots of distinct strings in memory ready to invoke the comparisons on, and selecting them in a way that the optimiser can't predict at compile time that's still fast enough not to overwhelm and obscure the impact of the string comparison code, is not an easy task. Further, beyond a point - comparing things spread across more memory makes cache affects more relevant to the benchmark, which further obscures the real comparison performance.

Still, if I were you I'd read some strings from a file - pushing each to a vector, then loop over the vector doing each of the three comparison operations between adjacent elements. Then the compiler can't possibly predict any pattern in the outcomes. You might find compare/== faster/slower than strcmp for strings often differing in the first character or three, but the other way around for long strings that are equal or only differing near the end, so make sure you try different kinds of input before you conclude you understand the performance profile.

like image 98
Tony Delroy Avatar answered Oct 11 '22 16:10

Tony Delroy