Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cost of using std::map with std::string keys vs int keys?

I know that the individual map queries take a maximum of log(N) time. However I was wondering, I have seen a lot of examples that use strings as map keys. What is the performance cost of associating a std::string as a key to a map instead of an int for example ?

std::map<std::string, aClass*> someMap; vs std::map<int, aClass*> someMap;

Thanks!

like image 459
Goles Avatar asked Dec 03 '09 21:12

Goles


People also ask

Can we use string as key in map C++?

A map is an associative container that maps keys to values, provides logarithmic complexity for inserting and finding, and constant time for erasing single elements. It is common for developers to use a map to keep track of objects by using a string key.

Why map is faster than unordered_map?

map just has a few house-keeping pointers, and memory for each object. Contrarily, unordered_map has a big array (these can get quite big in some implementations), and then additional memory for each object. If you need to be memory-aware, map should prove better, because it lacks the large array.

Which is faster std::map or std :: unordered_map?

Therefore, the optimized access with std::map is about 20% faster, but the access time of std::unordered_map about 6 times faster.

Should I use map or unordered_map C++?

map is used to store elements as key,value pairs in sorted order. unordered_map is used to store elements as key,value pairs in non-sorted order.


2 Answers

Analyzing algorithms for asymptotic performance is working on the operations that must be performed and the cost they add to the equation. For that you need to first know what are the performed operations and then evaluate its costs.

Searching for a key in a balanced binary tree (which maps happen to be) require O( log N ) complex operations. Each of those operations implies comparing the key for a match and following the appropriate pointer (child) if the key did not match. This means that the overall cost is proportional to log N times the cost of those two operations. Following pointers is a constant time operation O(1), and comparing keys depend on the key. For an integer key, comparisons are fast O(1). Comparing two strings is another story, it takes time proportional to the sizes of the strings involved O(L) (where I have used intentionally L as the length of string parameter instead of the more common N.

When you sum all the costs up you get that using integers as keys the total cost is O( log N )*( O(1) + O(1) ) that is equivalent to O( log N ). (O(1) gets hidden in the constant that the O notation silently hides.

If you use strings as keys, the total cost is O( log N )*( O(L) + O(1) ) where the constant time operation gets hidden by the more costly linear operation O(L) and can be converted into O( L * log N ). That is, the cost of locating an element in a map keyed by strings is proportional to the logarithm of the number of elements stored in the map times the average length of the strings used as keys.

Note that the big-O notation is most appropriate to use as an analysis tool to determine how the algorithm will behave when the size of the problem grows, but it hides many facts underneath that are important for raw performance.

As the simplest example, if you change the key from a generic string to an array of 1000 characters you can hide that cost within the constant dropped out of the notation. Comparing arrays of 1000 chars is a constant operation that just happens to take quite a bit of time. With the asymptotic notation that would just be a O( log N ) operation, as with integers.

The same happens with many other hidden costs, as the cost of creation of the elements that is usually considered as a constant time operation, just because it does not depend on the parameters to your problem (the cost of locating the block of memory in each allocation does not depend on your data set, but rather on memory fragmentation that is outside of the scope of the algorithm analysis, the cost of acquiring the lock inside malloc as to guarantee that not two processes try to return the same block of memory depends on the contention of the lock that depends itself number of processors, processes and how much memory requests they perform..., again out of the scope of the algorithm analysis). When reading costs in the big-O notation you must be conscious of what it really means.

like image 177
David Rodríguez - dribeas Avatar answered Oct 26 '22 05:10

David Rodríguez - dribeas


In addition to the time complexity from comparing strings already mentioned, a string key will also cause an additional memory allocation each time an item is added to the container. In certain cases, e.g. highly parallel systems, a global allocator mutex can be a source of performance problems.

In general, you should choose the alternative that makes the most sense in your situation, and only optimize based on actual performance testing. It's notoriously hard to judge what will be a bottleneck.

like image 45
Tim Sylvester Avatar answered Oct 26 '22 04:10

Tim Sylvester