Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does std::unordered_map handle collisions? [duplicate]

std::unordered_map guarantees O(1) time search, but how does it manage collision?

Cppreference claims

Unordered map is an associative container that contains key-value pairs with unique keys. Search, insertion, and removal of elements have average constant-time complexity.

Assuming a situation where all the Hash codes are same, how is the collision handled internally?

My assumption would be totally wrong if the hash code is unique to every key. In that case how is the unique hash code created where there are no collisions at all?

What approach does std::unordered_map's hash function take to guarantee O(1) search?

like image 706
user2256825 Avatar asked May 04 '16 11:05

user2256825


People also ask

Does unordered_map allow duplicates?

Because unordered_map containers do not allow for duplicate keys, this means that the function actually returns 1 if an element with that key exists in the container, and zero otherwise.

How does unordered map handle collisions?

unordered_map isn't required or expected to do anything special to avoid collisions. (Hash codes being "unique to every key" doesn't suffice anyway, as collisions can be created when hash codes are masked or mod-ed into the number of buckets.)

Does unordered_map use chaining?

Summary: all practical implementations of std::unordered_set (or unordered_map ) undoubtedly use collision chaining.

What is faster than unordered_map?

For the unordered_map + map , it takes 70 ms for unordered_map insertion and 80 ms for map insertion. So the hybrid implementation is 50 ms faster.


3 Answers

It doesn't guarantee O(1), it's O(1) on average... Worst case it can be O(n) when there are a lot of collisions. Please see link below, for more info:

https://stackoverflow.com/a/2771398/5874704

Update

Since the question has been edited, and now asks specifically about collisions for std::unordered_map, please have a look at the following answer:

https://stackoverflow.com/a/21519560/5874704

I think we can conclude that all practical implementations of std::unordered_set (or unordered_map) almost certainly use collision chaining. While it might be (just barely) possible to meet the requirements using linear probing or double hashing, such an implementation seems to lose a great deal and gain nearly nothing in return.

like image 117
A.Fagrell Avatar answered Oct 15 '22 11:10

A.Fagrell


There was an omission from your post that is crucial to understand: std::unordered_map has average-case O(1) search. It can take up to O(n) in the number of elements in the map to retrieve the element.

As for which hash function it uses - this is up to the user. By default it uses std::hash.

The only requirement on the hashing function with respect to collision handling is

Hash functions are only required to produce the same result for the same input within a single execution of a program; this allows salted hashes that prevent collision DoS attacks. (cppreference)

like image 42
erip Avatar answered Oct 15 '22 11:10

erip


std::unordered_map guarantees O(1) time search, but how does it manage collision?

It uses open addressing / separate chaining, see here.

Cppreference claims

Unordered map is an associative container that contains key-value pairs with unique keys. Search, insertion, and removal of elements have average constant-time complexity.

Assuming a situation where all the Hash codes are same, how is the collision handled internally?

The colliding elements are added into another container holding all values that hashed to that bucket. That container is usually a linked list, but there's nothing stopping an implementation using e.g. a binary tree.

My assumption would be totally wrong if the hash code is unique to every key. In that case how is the unique hash code created where there are no collisions at all?

unordered_map isn't required or expected to do anything special to avoid collisions. (Hash codes being "unique to every key" doesn't suffice anyway, as collisions can be created when hash codes are masked or mod-ed into the number of buckets.)

What approach does std::unordered_map's hash function take to guarantee O(1) search?

This is the crux of your misunderstanding. unordered_map has O(1) performance when the hash function does an adequate job of hashing the keys across the buckets. It may degrade to O(n) if the hash function is poor, or has been deliberately targeted by a malicious input of keys known to hash to the same bucket. The Standard does not require implementations to prevent that, but users can supply a cryptographic hash, pick a hash function from a family at runtime, or otherwise make it impractical for a malicious user - or similar inputs generally - to create many more collisions.

like image 20
Tony Delroy Avatar answered Oct 15 '22 10:10

Tony Delroy