I am trying to compare stl map and stl unordered_map for certain operations. I looked on the net and it only increases my doubts regarding which one is better as a whole. So I would like to compare the two on the basis of the operation they perform.
Which one performs faster in
Insert, Delete, Look-up
Which one takes less memory and less time to clear it from the memory. Any explanations are heartily welcomed !!!
Thanks in advance
Which one performs faster in Insert, Delete, Look-up? Which one takes less memory and less time to clear it from the memory. Any explanations are heartily welcomed !!!
For a specific use, you should try both with your actual data and usage patterns and see which is actually faster... there are enough factors that it's dangerous to assume either will always "win".
Academically - as the number of elements increases towards infinity, those operations on an std::unordered_map
(which is the C++ library offering for what Computing Science terms a "hash map" or "hash table") will tend to continue to take the same amount of time O(1) (ignoring memory limits/caching etc.), whereas with a std::map
(a balanced binary tree) each time the number of elements doubles it will typically need to do an extra comparison operation, so it gets gradually slower O(log2n).
std::unordered_map
implementations necessarily use open hashing: the fundamental expectation is that there'll be a contiguous array of "buckets", each logically a container of any values hashing thereto.
It generally serves to picture the hash table as a vector<list<pair<key,value>>>
where getting from the vector elements to a value involves at least one pointer dereference as you follow the list-head-pointer stored in the bucket to the initial list node; the insert/find/delete operations' performance depends on the size of the list, which on average equals the unordered_map
's load_factor
.
If the max_load_factor
is lowered (the default is 1.0), then there will be less collisions but more reallocation/rehashing during insertion and more wasted memory (which can hurt performance through increased cache misses).
The memory usage for this most-obvious of unordered_map
implementations involves both the contiguous array of bucket_count()
list-head-iterator/pointer-sized buckets and one doubly-linked list node per key/value pair. Typically, bucket_count()
+ 2 * size()
extra pointers of overhead, adjusted for any rounding-up of dynamic memory allocation request sizes the implementation might do. For example, if you ask for 100 bytes you might get 128 or 256 or 512. An implementation's dynamic memory routines might use some memory for tracking the allocated/available regions too.
Still, the C++ Standard leaves room for real-world implementations to make some of their own performance/memory-usage decisions. They could, for example, keep the old contiguous array of buckets around for a while after allocating a new larger array, so rehashing values into the latter can be done gradually to reduce the worst-case performance at the cost of average-case performance as both arrays are consulted during operations.
A map
is a binary tree, and can be expected to employ pointers linking distinct heap memory regions returned by different calls to new
. As well as the key/value data, each node in the tree will need parent, left, and right pointers (see wikipedia's binary tree article if lost).
So, both unordered_map
and map
need to allocate nodes for key/value pairs with the former typically having two-pointer/iterator overhead for prev/next-node linkage, and the latter having three for parent/left/right. But, the unordered_map
additionally has the single contiguous allocation for bucket_count()
buckets (== size()
/ load_factor()
).
For most purposes that's not a dramatic difference in memory usage, and the deallocation time difference for one extra region is unlikely to be noticeable.
For those occasions when the container's populated up front then repeatedly searched without further inserts/erases, it can sometimes be fastest to use a sorted vector, searched using Standard algorithms binary_search
, equal_range
, lower_bound
, upper_bound
. This has the advantage of a single contiguous memory allocation, which is much more cache friendly. It always outperforms map
, but unordered_map
may still be faster - measure if you care.
The reason there is both is that neither is better as a whole.
Use either one. Switch if the other proves better for your usage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With