Not sure how unordered_map works

Tags:

unordered-map

I've a little bit confusion how unordered_map works and what buckets are and how thet are managed.

From this blog post, unordered_map is vector of vectors.

My questions are:

is it correct to assume that the buckets are the "internal" vectors?
since each bucket (vector) can contains multiple elements, given by an hash collision on the hash table (the "external" vector), and since we have to scan this internal vector (in linear time), is it correct to assume that we have to define the equal method on the key type (in addiction to the hash operator) in order to find the key inside the bucket?
what is the external vector (hash table) size by default?
what is the internal vector size by default?
what happens if the number of elements in one bucket becomes too big?bor in other words, when the rehash happens?

Sorry for these questions, but I didn't find any detailed explanation how this structure works (on cppreference.com for example).

217

asked Apr 17 '16 05:04

2 Answers

std::unordered_map is the standard C++ hash table. It used to be called hash_map in STL, but missed the boat when many of STL's interfaces were merged into C++ in 1998, and by 2011, so many libraries had their own hash_map, C++ had to pick another name (I think "unordered" was a great choice; assuming order in a hash table is a common source of bugs).

is it correct to assume that the buckets are the "internal" vectors?

no, it is both incorrect (incompatible with iterator invalidation requirements) and dangerous (under that assumption you may end up subtracting pointers to elements in the same bucket).

In real life, the buckets are linked lists; e.g.

LLVM libc++ unordered_map is a unique_ptr to an array of linked lists of __hash_nodes
GNU libstdc++ unordered_map is a pointer to an array of linked lists of _Hash_nodes

is it correct to assume that we have to define the equal method on the key type (in addiction to the hash operator) in order to find the key inside the bucket?

Yes, locating the key in a bucket is exactly what the 4th template parameter of std::unordered_map is for (does not need to call the "equal method on the key type" literally, of course)

what is the external vector (hash table) size by default?

There is no "external vector". The number of buckets for a default-constructed std::unordered_map is implementation-defined, you can query it with bucket_count.

what is the internal vector size by default?

There is no "internal vector". The size of any given bucket equals the number of elements currently placed in the bucket. You can query it with bucket_size

what happens if the number of elements in one bucket becomes too big?bor in other words, when the rehash happens?

Nothing happens if the number of elements in one bucket becomes too big. But if average number of elements per bucket (aka load_factor) exceeds max_load_factor, rehash happens (e.g. on insert)

answered Nov 09 '22 16:11

Cubbi

This may help you understand the buckets: http://www.cplusplus.com/reference/unordered_map/unordered_map/bucket_count/ http://www.cplusplus.com/reference/unordered_map/unordered_map/max_load_factor/

But generally, yes, the buckets are something like internal vectors. It needs an equality operator (or a predicate) to distinguish keys that had the same hash as you suggest.

The initial number of buckets is possibly 0. It can be set via rehash() or reserve() (they have slightly different semantics.)

http://www.cplusplus.com/reference/unordered_map/unordered_map/rehash/

In an ideal case, each bucket will only have the one item. You can check this by using bucket_size. When the load factor (total items vs. bucket count) gets high, it rehashes automatically.

By default, it'll aim for a 1:1 load factor. If the hash function is good, this may last until max_bucket_count items are inserted.

Keep in mind that the specific implementation of this may vary. Each implementation (e.g. from different platforms or standard libraries) really only needs to have the correct semantics.

If these answers are important to your program, you can query the values as I've described. If you're just trying to wrap your head around it, query them in some test scenarios and it may become more clear.

answered Nov 09 '22 14:11

Todd Christensen

Related questions
                            
                                How to set console font to Raster Font, programmatically?
                            
                                Memory Sanitizer
                            
                                gtest unexpected conversion
                            
                                Need help regarding macro definition
                            
                                What is the most compact way to extract the template arguments of a class and iterate over them?
                            
                                How to iterate on unordered pairs inside an unordered_set?
                            
                                CLion "Instantiating an unknown structure without reference" but compiles fine
                            
                                operator T&() or operator T()?
                            
                                polymorphism in C++ and my exam
                            
                                ignoring packed attribute because of unpacked non-POD field
                            
                                Incrementing pointer prints garbage?
                            
                                The reason why dynamic_cast doesn't work with non-polymorphic types
                            
                                Variadic templates and multiple inheritance in c++11
                            
                                Declaring Qt class in header file [duplicate]
                            
                                In C++ - is it possible to compare volatile shared_ptr to nullptr?
                            
                                Bit planes of a 1-plane image in OpenCV only work for 1/3 of the image
                            
                                Is it safe to use a C++ Technical Specifications approved for a future standard in a earlier standard?
                            
                                In C++, is there to return out of a nested function that will end the parent function?
                            
                                decltype parenthesis syntax for a lvalue
                            
                                Does boost::asio support websockets?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With