I have a need to map strings of constant size, which contain just alphanumeric values (A-Z, 0-9, no lower case letters) to other strings. The unordered_map becomes very large (tens of millions of keys), while the mapped values are from a set of a few thousand strings. Upon profiling I found that most time is spent inserting new values into the map (operator[]), and also clearing the map takes a very long time.
std::unordered_map<std::string, std::string> hashMap;
while (...){
...
hashMap[key] = value; // ~50% of program time is spent here
...
}
hashMap.clear(); // Takes a very long time, at this point hashMap.size() > 20,000,000
My thoughts are that the string allocations/deallocations are very slow, as well as hashing and inserting into the map. Any suggestions to optimize this? Keep in mind that the key size is constant, and its contents are limited to a set of 36 characters, and that the mapped values are from a limited set. I'm open to using different container / data types other than strings and unordered_map.
Following a suggestions by Baum Mit Augen I changed my key type to unsigned long long and made a function to convert base 36 to decimal:
unsigned long long ConvertBase36(const char* num)
{
unsigned long long retVal = 0;
for (int i = 0; i < 12; i++)
{
unsigned int digit = 0;
char currChar = num[i];
if (currChar <= '9')
{
digit = currChar - '0';
}
else
{
digit = currChar - 'A' + 10;
}
retVal *= 36;
retVal += digit;
}
return retVal;
}
This gave me about 10% improvement in whole program runtime. I then tried to use the unordered_map reserve function again to see if it made any difference and it did not. Trying map instead of unordered_map did about 10% worse so I reverted that change. Finally replacing the string value with an unsigned int made things a bit faster.
Two unrelated suggestions, but both related to std::unordered_map::reserve
.
First, since your unordered map contains 10Ms of elements, there are probably many re-allocations/rehashes going on as you insert. At the start, you might want to reserve 10Ms of entries.
Since
the mapped values are from a set of a few thousand strings
you should be able to store the values themselves in a secondary unordered_set
that you first reserve
d to something large enough to ensure no iterators get invalidated on insert
s - see invalidation guarantees for unordered associative containers.
Your (primary) unordered_map
can then map string
s to std::unordered_set::const_iterator
s.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With