Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ ~ 1M look-ups in unordered_map with string key works much slower than .NET code

Tags:

I have .NET and C++ implementations of a perf test function that does 854,750 lookups in a dictionary using string keys from a pool of 6838 keys. I wrote these functions to investigate a perf bottleneck in a real app.

.NET implementation is written in F#, uses Dictionary and is compiled for .NET 4.0

C++ implementation uses std::unordered_map and is built with VS2010 in Release mode.

On my machine .NET code runs in 240 ms on average and C++ code runs in 630 ms. Could you please help me to understand what can be the reason for this huge difference in speed?

If I make key length in C++ implementation shorter and use "key_" prefix instead of "key_prefix_" it will run in 140 ms.

Another trick I tried is to replace std::string with a custom immutable string implementation that has a const char* pointer to the source and a one-time computed hash. Using this string allowed to get performance of C++ implementation down to 190 ms.

C++ code:

struct SomeData { public:     float Value; };  typedef std::string KeyString; typedef std::unordered_map<KeyString, SomeData> DictionaryT;  const int MaxNumberOfRuns = 125; const int MaxNumberOfKeys = 6838;  DictionaryT dictionary; dictionary.rehash(MaxNumberOfKeys);  auto timer = Stopwatch::StartNew();  int lookupCount = 0;  char keyBuffer[100] = "key_prefix_"; size_t keyPrefixLen = std::strlen(keyBuffer);  /// run MaxNumberOfRuns * MaxNumberOfKeys iterations for(int runId = 0; runId < MaxNumberOfRuns; runId++) {     for(int keyId = 0; keyId < MaxNumberOfKeys; keyId++)     {         /// get a new key from the pool of MaxNumberOfKeys keys                    int randomKeySuffix = (std::rand() % MaxNumberOfKeys);         ::itoa(randomKeySuffix, keyBuffer + keyPrefixLen, 10);          KeyString key = keyBuffer;          /// lookup key in the dictionary                  auto dataIter = dictionary.find(key);         SomeData* data;          if(dataIter != dictionary.end())         {             /// get existing value                        data = &dataIter->second;         }         else         {             /// add a new value             data = &dictionary.insert(dataIter, DictionaryT::value_type(key, SomeData()))->second;         }          /// update corresponding value in the dictionary         data->Value += keyId * runId;         lookupCount++;     } }  timer.Stop(); std::cout << "Time: " << timer.GetElapsedMilleseconds() << " ms" << std::endl; std::cout << "Lookup count: " << lookupCount << std::endl; 

Prints:

Time: 636 ms
Lookup count: 854750

F# code

open System open System.Diagnostics open System.Collections.Generic  type SomeData =     struct         val mutable Value : float     end  let dictionary = new Dictionary<string, SomeData>() let randomGen = new Random()  let MaxNumberOfRuns = 125 let MaxNumberOfKeys = 6838  let timer = Stopwatch.StartNew()  let mutable lookupCount = 0  /// run MaxNumberOfRuns * MaxNumberOfKeys iterations for runId in 1 .. MaxNumberOfRuns do     for keyId in 1 .. MaxNumberOfKeys do          /// get a new key from the pool of MaxNumberOfKeys keys         let randomKeySuffix = randomGen.Next(0, MaxNumberOfKeys).ToString()                 let key = "key_prefix_" + randomKeySuffix          /// lookup key in the dictionary         let mutable found, someData = dictionary.TryGetValue (key)         if not(found) then             /// add a new value             someData <- new SomeData()             dictionary.[key] <- someData          /// update corresponding value in the dictionary         someData.Value <- someData.Value + float(keyId) * float(runId)          lookupCount <- lookupCount + 1  timer.Stop()  printfn "Time: %d ms" timer.ElapsedMilliseconds printfn "Lookup count: %d" lookupCount 

Prints:

Time: 245 ms
Lookup count: 854750

like image 324
evgenyp Avatar asked Dec 04 '11 01:12

evgenyp


People also ask

Can we use string in unordered_map?

So the bottom line is - make sure you have a #include <string> if you're trying to use strings in an unordered_map<> - actually, any time you're using std::string . Unfortunately, the compiler will sometimes let you get away without the include because of side effects from other includes.

Why is unordered_map slow?

std::unordered_map is supposedly slow because it has fairly stringent iterator invalidation requirements. In my experience, unless you wring the most performance out of your code as you can, it's not a huge issue; it's generally faster than most casual implementations.

Which has better worst case time complexity unordered map or map?

But in the worst-case scenario, the unordered_map is slower than a map because the worst time complexity of all the operations in an unordered_map (O(n)) is greater than the time complexity for all the operations in a map (O(log n)).

What is the complexity of find in unordered_map?

// find function in unordered_map. Time Complexity : O(1) on average.


2 Answers

Visual Studio 2010 uses a performant hash function for std::string, rather than an accurate one. Basically, if the key string is larger than 10 characters, the hash function stops using every character for the hash, and has a stride greater than 1.

size_t operator()(const _Kty& _Keyval) const     {   // hash _Keyval to size_t value by pseudorandomizing transform     size_t _Val = 2166136261U;     size_t _First = 0;     size_t _Last = _Keyval.size();     size_t _Stride = 1 + _Last / 10;      for(; _First < _Last; _First += _Stride)         _Val = 16777619U * _Val ^ (size_t)_Keyval[_First];     return (_Val);     } 
  • size() >= 10 - use every second character after the first
  • size() >= 20 - use every third character after the first
  • ...

Thanks to this, collisions happen more frequently, which slows the code down of course. Try a custom hash function for the C++ version.

like image 103
Xeo Avatar answered Oct 11 '22 04:10

Xeo


We can only speculate why one version is faster than the other. You could definitely use a profiler here to tell you where the hot spot is. So don't take any of this as a definitive answer.

Your note about the c++ version being faster with a shorter key length is illuminating because it could point to a couple of things:

  • Maybe the hash function for std::string is really optimized for small strings rather than long strings.
  • Maybe the longer string takes longer to copy into the unordered_set because it disables a small string optimization in VS 2010 c++ library. Thus the copy into the map requires an allocation.

Off the cuff, here's some wild observations based on my experience with unordered_map (though I'm more familiar with the boost implementation that Microsoft's).

  • In this example there's no reason to use a std::string as the key type, just use the integer value. This would presumably make both c++ and F# versions faster.

  • When you insert values into the map, it's probably not faster to do a find followed by an insert since both will require re-hashing the key string. Just used the [] operator, which does a find-or-insert operation on it's own. I guess this would depend on how often you find a hit in the map versus add a new value.

  • If allocation is the bottleneck and you must use a string key type you might get better performance out of storing a shared ptr to the string rather than copying the string when you insert it into the map.

  • Try providing your own hash function for the key type that ignores the "key_prefix_" part of the string

  • Try boost's implementation; maybe it's faster.

Again, a profile run would quickly tell you where to look for this kind of problem. Specifically, it will tell you if there's a bottleneck in hashing vs. a bottleneck in allocation.

like image 25
karunski Avatar answered Oct 11 '22 04:10

karunski