Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ map<std::string> vs map<char *> performance (I know, "again?")

I was using a map with a std::string key and while everything was working fine I wasn't getting the performance I expected. I searched for places to optimize and improved things only a little and that's when a colleague said, "that string key is going to be slow."

I read dozens of questions and they consistently say:

"don't use a char * as a key"
"std::string keys are never your bottleneck"
"the performance difference between a char * and a std::string is a myth."

I reluctantly tried a char * key and there was a difference, a big difference.

I boiled the problem down to a simple example:

#include <stdio.h> #include <stdlib.h> #include <map>  #ifdef USE_STRING  #include <string> typedef std::map<std::string, int> Map;  #else  #include <string.h> struct char_cmp {      bool operator () (const char *a,const char *b) const      {         return strcmp(a,b)<0;     }  }; typedef std::map<const char *, int, char_cmp> Map;  #endif  Map m;  bool test(const char *s) {     Map::iterator it = m.find(s);     return it != m.end(); }  int main(int argc, char *argv[]) {     m.insert( Map::value_type("hello", 42) );      const int lcount = atoi(argv[1]);     for (int i=0 ; i<lcount ; i++) test("hello"); } 

First the std::string version:

$ g++ -O3 -o test test.cpp -DUSE_STRING $ time ./test 20000000 real    0m1.893s 

Next the 'char *' version:

g++ -O3 -o test test.cpp              $ time ./test 20000000 real    0m0.465s 

That's a pretty big performance difference and about the same difference I see in my larger program.

Using a char * key is a pain to handle freeing the key and just doesn't feel right. C++ experts what am I missing? Any thoughts or suggestions?

like image 738
uroc Avatar asked Aug 27 '12 03:08

uroc


People also ask

Can you use a string as a key in a map C++?

A map is an associative container that maps keys to values, provides logarithmic complexity for inserting and finding, and constant time for erasing single elements. It is common for developers to use a map to keep track of objects by using a string key.

Are maps in C++ slow?

Maps are 'fast enough' but not brilliant for some cases. Try to analyze what is the structure of objects you need to store. If the fields are fixed I'd recommend not to use nested maps. At all.

Which is faster set or map in C++?

The map solution results in "Time Limit Exceeded on Test 3", whereas the set solution results in "Time Limit Exceeded on Test 2", which means that Test 2 is such that the map solution works faster on it than the set solution.

Can we compare 2 maps in C++?

The C++ function std::map::operator== tests whether two maps are equal or not.


2 Answers

You are using a const char * as a lookup key for find(). For the map containing const char* this is the correct type that find expects and the lookup can be done directly.

The map containing std::string expects the parameter of find() to be a std::string, so in this case the const char* first has to be converted to a std::string. This is probably the difference you are seeing.

like image 117
sth Avatar answered Sep 25 '22 03:09

sth


As sth noted, the issue is one of specifications of the associative containers (sets and maps), in that their member search methods always force a conversion to the key_type, even if an operator< exists that would accept to compare your key against the keys in the map despite their different types.

On the other hand, the functions in <algorithm> do not suffer from this, for example lower_bound is defined as:

template< class ForwardIt, class T > ForwardIt lower_bound( ForwardIt first, ForwardIt last, const T& value );  template< class ForwardIt, class T, class Compare > ForwardIt lower_bound( ForwardIt first, ForwardIt last, const T& value, Compare comp ); 

So, an alternative could be:

std::vector< std::pair< std::string, int > > 

And then you could do:

std::lower_bound(vec.begin(), vec.end(), std::make_pair("hello", 0), CompareFirst{}) 

Where CompareFirst is defined as:

struct CompareFirst {      template <typename T, typename U>      bool operator()(T const& t, U const& u) const { return t.first < u.first; } }; 

Or even build a completely custom comparator (but it's a bit harder).

A vector of pair is generally more efficient in read-heavy loads, so it's really to store a configuration for example.

I do advise to provide methods to wrap the accesses. lower_bound is pretty low-level.

like image 28
Matthieu M. Avatar answered Sep 25 '22 03:09

Matthieu M.