I have .NET and C++ implementations of a perf test function that does 854,750 lookups in a dictionary using string keys from a pool of 6838 keys. I wrote these functions to investigate a perf bottleneck in a real app.
.NET implementation is written in F#, uses Dictionary and is compiled for .NET 4.0
C++ implementation uses std::unordered_map and is built with VS2010 in Release mode.
On my machine .NET code runs in 240 ms on average and C++ code runs in 630 ms. Could you please help me to understand what can be the reason for this huge difference in speed?
If I make key length in C++ implementation shorter and use "key_" prefix instead of "key_prefix_" it will run in 140 ms.
Another trick I tried is to replace std::string with a custom immutable string implementation that has a const char* pointer to the source and a one-time computed hash. Using this string allowed to get performance of C++ implementation down to 190 ms.
C++ code:
struct SomeData { public: float Value; }; typedef std::string KeyString; typedef std::unordered_map<KeyString, SomeData> DictionaryT; const int MaxNumberOfRuns = 125; const int MaxNumberOfKeys = 6838; DictionaryT dictionary; dictionary.rehash(MaxNumberOfKeys); auto timer = Stopwatch::StartNew(); int lookupCount = 0; char keyBuffer[100] = "key_prefix_"; size_t keyPrefixLen = std::strlen(keyBuffer); /// run MaxNumberOfRuns * MaxNumberOfKeys iterations for(int runId = 0; runId < MaxNumberOfRuns; runId++) { for(int keyId = 0; keyId < MaxNumberOfKeys; keyId++) { /// get a new key from the pool of MaxNumberOfKeys keys int randomKeySuffix = (std::rand() % MaxNumberOfKeys); ::itoa(randomKeySuffix, keyBuffer + keyPrefixLen, 10); KeyString key = keyBuffer; /// lookup key in the dictionary auto dataIter = dictionary.find(key); SomeData* data; if(dataIter != dictionary.end()) { /// get existing value data = &dataIter->second; } else { /// add a new value data = &dictionary.insert(dataIter, DictionaryT::value_type(key, SomeData()))->second; } /// update corresponding value in the dictionary data->Value += keyId * runId; lookupCount++; } } timer.Stop(); std::cout << "Time: " << timer.GetElapsedMilleseconds() << " ms" << std::endl; std::cout << "Lookup count: " << lookupCount << std::endl;
Prints:
Time: 636 ms
Lookup count: 854750
F# code
open System open System.Diagnostics open System.Collections.Generic type SomeData = struct val mutable Value : float end let dictionary = new Dictionary<string, SomeData>() let randomGen = new Random() let MaxNumberOfRuns = 125 let MaxNumberOfKeys = 6838 let timer = Stopwatch.StartNew() let mutable lookupCount = 0 /// run MaxNumberOfRuns * MaxNumberOfKeys iterations for runId in 1 .. MaxNumberOfRuns do for keyId in 1 .. MaxNumberOfKeys do /// get a new key from the pool of MaxNumberOfKeys keys let randomKeySuffix = randomGen.Next(0, MaxNumberOfKeys).ToString() let key = "key_prefix_" + randomKeySuffix /// lookup key in the dictionary let mutable found, someData = dictionary.TryGetValue (key) if not(found) then /// add a new value someData <- new SomeData() dictionary.[key] <- someData /// update corresponding value in the dictionary someData.Value <- someData.Value + float(keyId) * float(runId) lookupCount <- lookupCount + 1 timer.Stop() printfn "Time: %d ms" timer.ElapsedMilliseconds printfn "Lookup count: %d" lookupCount
Prints:
Time: 245 ms
Lookup count: 854750
So the bottom line is - make sure you have a #include <string> if you're trying to use strings in an unordered_map<> - actually, any time you're using std::string . Unfortunately, the compiler will sometimes let you get away without the include because of side effects from other includes.
std::unordered_map is supposedly slow because it has fairly stringent iterator invalidation requirements. In my experience, unless you wring the most performance out of your code as you can, it's not a huge issue; it's generally faster than most casual implementations.
But in the worst-case scenario, the unordered_map is slower than a map because the worst time complexity of all the operations in an unordered_map (O(n)) is greater than the time complexity for all the operations in a map (O(log n)).
// find function in unordered_map. Time Complexity : O(1) on average.
Visual Studio 2010 uses a performant hash function for std::string
, rather than an accurate one. Basically, if the key string is larger than 10 characters, the hash function stops using every character for the hash, and has a stride greater than 1
.
size_t operator()(const _Kty& _Keyval) const { // hash _Keyval to size_t value by pseudorandomizing transform size_t _Val = 2166136261U; size_t _First = 0; size_t _Last = _Keyval.size(); size_t _Stride = 1 + _Last / 10; for(; _First < _Last; _First += _Stride) _Val = 16777619U * _Val ^ (size_t)_Keyval[_First]; return (_Val); }
size() >= 10
- use every second character after the first size() >= 20
- use every third character after the first Thanks to this, collisions happen more frequently, which slows the code down of course. Try a custom hash function for the C++ version.
We can only speculate why one version is faster than the other. You could definitely use a profiler here to tell you where the hot spot is. So don't take any of this as a definitive answer.
Your note about the c++ version being faster with a shorter key length is illuminating because it could point to a couple of things:
Off the cuff, here's some wild observations based on my experience with unordered_map (though I'm more familiar with the boost implementation that Microsoft's).
In this example there's no reason to use a std::string as the key type, just use the integer value. This would presumably make both c++ and F# versions faster.
When you insert values into the map, it's probably not faster to do a find followed by an insert since both will require re-hashing the key string. Just used the [] operator, which does a find-or-insert operation on it's own. I guess this would depend on how often you find a hit in the map versus add a new value.
If allocation is the bottleneck and you must use a string key type you might get better performance out of storing a shared ptr to the string rather than copying the string when you insert it into the map.
Try providing your own hash function for the key type that ignores the "key_prefix_" part of the string
Try boost's implementation; maybe it's faster.
Again, a profile run would quickly tell you where to look for this kind of problem. Specifically, it will tell you if there's a bottleneck in hashing vs. a bottleneck in allocation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With