Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which one is better stl map or unordered_map for the following cases

I am trying to compare stl map and stl unordered_map for certain operations. I looked on the net and it only increases my doubts regarding which one is better as a whole. So I would like to compare the two on the basis of the operation they perform.

Which one performs faster in

Insert, Delete, Look-up

Which one takes less memory and less time to clear it from the memory. Any explanations are heartily welcomed !!!

Thanks in advance

like image 846
Anurag Sharma Avatar asked Sep 12 '12 01:09

Anurag Sharma


2 Answers

Which one performs faster in Insert, Delete, Look-up? Which one takes less memory and less time to clear it from the memory. Any explanations are heartily welcomed !!!

For a specific use, you should try both with your actual data and usage patterns and see which is actually faster... there are enough factors that it's dangerous to assume either will always "win".

implementation and characteristics of unordered maps / hash tables

Academically - as the number of elements increases towards infinity, those operations on an std::unordered_map (which is the C++ library offering for what Computing Science terms a "hash map" or "hash table") will tend to continue to take the same amount of time O(1) (ignoring memory limits/caching etc.), whereas with a std::map (a balanced binary tree) each time the number of elements doubles it will typically need to do an extra comparison operation, so it gets gradually slower O(log2n).

std::unordered_map implementations necessarily use open hashing: the fundamental expectation is that there'll be a contiguous array of "buckets", each logically a container of any values hashing thereto.

It generally serves to picture the hash table as a vector<list<pair<key,value>>> where getting from the vector elements to a value involves at least one pointer dereference as you follow the list-head-pointer stored in the bucket to the initial list node; the insert/find/delete operations' performance depends on the size of the list, which on average equals the unordered_map's load_factor.

If the max_load_factor is lowered (the default is 1.0), then there will be less collisions but more reallocation/rehashing during insertion and more wasted memory (which can hurt performance through increased cache misses).

The memory usage for this most-obvious of unordered_map implementations involves both the contiguous array of bucket_count() list-head-iterator/pointer-sized buckets and one doubly-linked list node per key/value pair. Typically, bucket_count() + 2 * size() extra pointers of overhead, adjusted for any rounding-up of dynamic memory allocation request sizes the implementation might do. For example, if you ask for 100 bytes you might get 128 or 256 or 512. An implementation's dynamic memory routines might use some memory for tracking the allocated/available regions too.

Still, the C++ Standard leaves room for real-world implementations to make some of their own performance/memory-usage decisions. They could, for example, keep the old contiguous array of buckets around for a while after allocating a new larger array, so rehashing values into the latter can be done gradually to reduce the worst-case performance at the cost of average-case performance as both arrays are consulted during operations.

implementation and characteristics of maps / balanced binary trees

A map is a binary tree, and can be expected to employ pointers linking distinct heap memory regions returned by different calls to new. As well as the key/value data, each node in the tree will need parent, left, and right pointers (see wikipedia's binary tree article if lost).

comparison

So, both unordered_map and map need to allocate nodes for key/value pairs with the former typically having two-pointer/iterator overhead for prev/next-node linkage, and the latter having three for parent/left/right. But, the unordered_map additionally has the single contiguous allocation for bucket_count() buckets (== size() / load_factor()).

For most purposes that's not a dramatic difference in memory usage, and the deallocation time difference for one extra region is unlikely to be noticeable.

another alternative

For those occasions when the container's populated up front then repeatedly searched without further inserts/erases, it can sometimes be fastest to use a sorted vector, searched using Standard algorithms binary_search, equal_range, lower_bound, upper_bound. This has the advantage of a single contiguous memory allocation, which is much more cache friendly. It always outperforms map, but unordered_map may still be faster - measure if you care.

like image 126
Tony Delroy Avatar answered Nov 15 '22 04:11

Tony Delroy


The reason there is both is that neither is better as a whole.

Use either one. Switch if the other proves better for your usage.

  • std::map provides better space for worse time.
  • std::unordered_map provides better time for worse space.
like image 20
Drew Dormann Avatar answered Nov 15 '22 05:11

Drew Dormann