C++ ~ 1M look-ups in unordered_map with string key works much slower than .NET code

Tags:

I have .NET and C++ implementations of a perf test function that does 854,750 lookups in a dictionary using string keys from a pool of 6838 keys. I wrote these functions to investigate a perf bottleneck in a real app.

.NET implementation is written in F#, uses Dictionary and is compiled for .NET 4.0

C++ implementation uses std::unordered_map and is built with VS2010 in Release mode.

On my machine .NET code runs in 240 ms on average and C++ code runs in 630 ms. Could you please help me to understand what can be the reason for this huge difference in speed?

If I make key length in C++ implementation shorter and use "key_" prefix instead of "key_prefix_" it will run in 140 ms.

Another trick I tried is to replace std::string with a custom immutable string implementation that has a const char* pointer to the source and a one-time computed hash. Using this string allowed to get performance of C++ implementation down to 190 ms.

C++ code:

struct SomeData { public:     float Value; };  typedef std::string KeyString; typedef std::unordered_map<KeyString, SomeData> DictionaryT;  const int MaxNumberOfRuns = 125; const int MaxNumberOfKeys = 6838;  DictionaryT dictionary; dictionary.rehash(MaxNumberOfKeys);  auto timer = Stopwatch::StartNew();  int lookupCount = 0;  char keyBuffer[100] = "key_prefix_"; size_t keyPrefixLen = std::strlen(keyBuffer);  /// run MaxNumberOfRuns * MaxNumberOfKeys iterations for(int runId = 0; runId < MaxNumberOfRuns; runId++) {     for(int keyId = 0; keyId < MaxNumberOfKeys; keyId++)     {         /// get a new key from the pool of MaxNumberOfKeys keys                    int randomKeySuffix = (std::rand() % MaxNumberOfKeys);         ::itoa(randomKeySuffix, keyBuffer + keyPrefixLen, 10);          KeyString key = keyBuffer;          /// lookup key in the dictionary                  auto dataIter = dictionary.find(key);         SomeData* data;          if(dataIter != dictionary.end())         {             /// get existing value                        data = &dataIter->second;         }         else         {             /// add a new value             data = &dictionary.insert(dataIter, DictionaryT::value_type(key, SomeData()))->second;         }          /// update corresponding value in the dictionary         data->Value += keyId * runId;         lookupCount++;     } }  timer.Stop(); std::cout << "Time: " << timer.GetElapsedMilleseconds() << " ms" << std::endl; std::cout << "Lookup count: " << lookupCount << std::endl;

Prints:

Time: 636 ms
Lookup count: 854750

F# code

open System open System.Diagnostics open System.Collections.Generic  type SomeData =     struct         val mutable Value : float     end  let dictionary = new Dictionary<string, SomeData>() let randomGen = new Random()  let MaxNumberOfRuns = 125 let MaxNumberOfKeys = 6838  let timer = Stopwatch.StartNew()  let mutable lookupCount = 0  /// run MaxNumberOfRuns * MaxNumberOfKeys iterations for runId in 1 .. MaxNumberOfRuns do     for keyId in 1 .. MaxNumberOfKeys do          /// get a new key from the pool of MaxNumberOfKeys keys         let randomKeySuffix = randomGen.Next(0, MaxNumberOfKeys).ToString()                 let key = "key_prefix_" + randomKeySuffix          /// lookup key in the dictionary         let mutable found, someData = dictionary.TryGetValue (key)         if not(found) then             /// add a new value             someData <- new SomeData()             dictionary.[key] <- someData          /// update corresponding value in the dictionary         someData.Value <- someData.Value + float(keyId) * float(runId)          lookupCount <- lookupCount + 1  timer.Stop()  printfn "Time: %d ms" timer.ElapsedMilliseconds printfn "Lookup count: %d" lookupCount

Prints:

Time: 245 ms
Lookup count: 854750

324

asked Dec 04 '11 01:12

evgenyp

2 Answers

Visual Studio 2010 uses a performant hash function for std::string, rather than an accurate one. Basically, if the key string is larger than 10 characters, the hash function stops using every character for the hash, and has a stride greater than 1.

size_t operator()(const _Kty& _Keyval) const     {   // hash _Keyval to size_t value by pseudorandomizing transform     size_t _Val = 2166136261U;     size_t _First = 0;     size_t _Last = _Keyval.size();     size_t _Stride = 1 + _Last / 10;      for(; _First < _Last; _First += _Stride)         _Val = 16777619U * _Val ^ (size_t)_Keyval[_First];     return (_Val);     }

size() >= 10 - use every second character after the first
size() >= 20 - use every third character after the first
...

Thanks to this, collisions happen more frequently, which slows the code down of course. Try a custom hash function for the C++ version.

103

answered Oct 11 '22 04:10

Xeo

We can only speculate why one version is faster than the other. You could definitely use a profiler here to tell you where the hot spot is. So don't take any of this as a definitive answer.

Your note about the c++ version being faster with a shorter key length is illuminating because it could point to a couple of things:

Maybe the hash function for std::string is really optimized for small strings rather than long strings.
Maybe the longer string takes longer to copy into the unordered_set because it disables a small string optimization in VS 2010 c++ library. Thus the copy into the map requires an allocation.

Off the cuff, here's some wild observations based on my experience with unordered_map (though I'm more familiar with the boost implementation that Microsoft's).

In this example there's no reason to use a std::string as the key type, just use the integer value. This would presumably make both c++ and F# versions faster.
When you insert values into the map, it's probably not faster to do a find followed by an insert since both will require re-hashing the key string. Just used the [] operator, which does a find-or-insert operation on it's own. I guess this would depend on how often you find a hit in the map versus add a new value.
If allocation is the bottleneck and you must use a string key type you might get better performance out of storing a shared ptr to the string rather than copying the string when you insert it into the map.
Try providing your own hash function for the key type that ignores the "key_prefix_" part of the string
Try boost's implementation; maybe it's faster.

Again, a profile run would quickly tell you where to look for this kind of problem. Specifically, it will tell you if there's a bottleneck in hashing vs. a bottleneck in allocation.

answered Oct 11 '22 04:10

karunski

Related questions
                            
                                How to copy a database from one computer to another?
                            
                                Hadoop Datanodes cannot find NameNode
                            
                                Get month name from number in PostgreSQL
                            
                                length of System.currentTimeMillis
                            
                                Add request parameter to request
                            
                                Running console applications on other monitor
                            
                                Custom ItemsSource property for a UserControl
                            
                                What's the difference between the implements & extends keywords in Java [duplicate]
                            
                                Regular Expression - 2 letters and 2 numbers in C#
                            
                                Validate presence of one of multiple attributes in rails
                            
                                Powershell test for noninteractive mode
                            
                                Compiling (javac) a UTF8 encoded Java source code with a BOM

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With