Consider this code. I reserve 6 spots for an unordered_map and insert 6 elements. Afterwards, there are 7 buckets. Why is this? The max_load_factor is 1 and there are enough buckets for the number of elements I insert.
#include <iostream>
#include <unordered_map>
using namespace std;
int main () {
unordered_map<std::string,std::string> mymap = {
{"house","maison"},
{"apple","pomme"},
{"tree","arbre"},
{"book","livre"},
{"door","porte"},
{"grapefruit","pamplemousse"}
};
unordered_map<std::string,std::string> mymap2; // THIS ONE!!!
mymap2.reserve(6);
for (auto i:mymap) {
mymap2[i.first] = i.second;
}
std::cout << "max_load factor " << mymap2.max_load_factor() << " mymap has " << mymap2.bucket_count() << " buckets.\n";
for (unsigned i=0; i<mymap2.bucket_count(); ++i) {
cout << "bucket #" << i << " contains: ";
for (auto it = mymap2.begin(i); it!=mymap2.end(i); ++it)
cout << "[" << it->first << ":" << it->second << "] ";
cout << endl;
}
return 0;
}
Output:
max_load factor 1 mymap has 7 buckets.
bucket #0 contains:
bucket #1 contains: [book:livre]
bucket #2 contains: [tree:arbre]
bucket #3 contains: [house:maison] [grapefruit:pamplemousse]
bucket #4 contains:
bucket #5 contains: [door:porte]
bucket #6 contains: [apple:pomme]
In libc++ sizeof map is 24 whereas sizeof unordered_map is 40.
By default, unordered_map containers have a max_load_factor of 1.0.
C++ Unordered_map Library - bucket() Function Bucket is a memory space in the container's hash table to which elements are assigned based on the hash value of their key. Valid range of buckets is from 0 to bucket_count - 1.
The unordered_map::reserve() function is used to reserve the number of buckets in the unordered map to store at least n elements inside the map. If the bucket count cannot hold n elements, then the bucket count is increased and a rehashing occurs.
The unordered_map container has a reserve method because it is implemented using buckets, and not a tree as in map. a slot in the container's internal hash table to which elements are assigned based on the hash value of their key. Buckets are numbered from 0 to (bucket_count-1). ( source) A single bucket holds a variable number of items.
Return Value: It returns the number of the element present in the unordered map. Constant i.e. O (1).
The number of elements in a bucket influences the time it takes to access a particular element in the bucket. The container automatically increases the number of buckets to keep the load factor (which is the average bucket size) below its max_load_factor. Bucket number.
I would say yes for two reasons: (1) It pre-allocates the memory for the buckets, and (2) it can prevent the need of a rehash, which as discussed earlier completely rebuilds the map.
The cplusplus.com website gives this explanation:
void reserve (size_type n);
Request a capacity change
Sets the number of buckets in the container (
bucket_count
) to the most appropriate to contain at least n elements.If n is greater than the current
bucket_count
multiplied by themax_load_factor
, the container'sbucket_count
is increased and a rehash is forced.If n is lower than that, the function may have no effect.
At the time you declare your unordered_map
variable, it has a bucket_count
of 1
and a max_load_factor
of 1
.
Then you reserve
6
buckets which is greater than max_load_factor
multiplied by bucket_count
According to this definition, the behavior is, in my humble opinion, correct.
I added at line 17
of your code the following line to show the bucket_count
before the reserve
and indeed, it is 1
std::cout << "BEFORE RESERVE max_load factor " << mymap2.max_load_factor() << " mymap has " << mymap2.bucket_count() << " buckets.\n";
The display is as follows:
BEFORE RESERVE max_load factor 1 mymap has 1 buckets.
After the reserve:
AFTER RESERVE max_load factor 1 mymap has 7 buckets.
Thus the behavior is normal in my humble opinion.
Hash table implementations tend to pick between two desirable qualities:
maintaining a power-of-two bucket_count()
(i.e. rounding any value passed to reserve up to the next power of two if necessary), so
a size_t
value returned by the hash function can be mapped
onto the range of buckets using a 1-CPU-cycle bitwise AND operation
(e.g. 8 buckets -> hash value ANDed with 7);
this has the undesirable effect of chopping off the high-order bits from the hash value, so they don't help randomise the bucket placement
Visual C++ does this
maintaining a prime bucket_count()
this has the extremely desirable side-effect of having high-order bits in the hash value affect the bucket selection, so lower-quality (i.e. faster) hash functions still often manage a more equal, less-collision/clustering-prone, bucket placement
implemented naively, this forces the compiler to do a mod ("%") operation by a runtime-variable bucket_count()
, which may take e.g. 40-90 CPU cycles, depending on the CPU. A faster alternative is use the index into the table of prime numbers used when sizing the hash table to switch into a mod operation by that hard-coded constant prime value, so the compiler can try to optimise the mod using bit shifts, subtractions or additions, or multiplications if that's necessary (you can see the kinds of optimisations possible in this snippet on godbolt)
GCC does this; I think clang does too.
So, summarily - when you ask for 6 buckets, GCC or clang will increase that to some prime - not necessarily the next one, but it appears that's happened in this case - to reduce the collision-proneness when you later insert elements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With