Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does unordered_map increase in size when it has enough buckets due to "reserve"?

Consider this code. I reserve 6 spots for an unordered_map and insert 6 elements. Afterwards, there are 7 buckets. Why is this? The max_load_factor is 1 and there are enough buckets for the number of elements I insert.

#include <iostream>
#include <unordered_map>
using namespace std;

int main () {
  unordered_map<std::string,std::string> mymap = { 
            {"house","maison"},
            {"apple","pomme"},
            {"tree","arbre"},
            {"book","livre"},
            {"door","porte"},
            {"grapefruit","pamplemousse"}
  };

    unordered_map<std::string,std::string> mymap2; // THIS ONE!!!
    mymap2.reserve(6);
    for (auto i:mymap) {
        mymap2[i.first] = i.second;
    }


    std::cout << "max_load factor " << mymap2.max_load_factor() << " mymap has " << mymap2.bucket_count() << " buckets.\n";

      for (unsigned i=0; i<mymap2.bucket_count(); ++i) {
        cout << "bucket #" << i << " contains: ";
        for (auto it = mymap2.begin(i); it!=mymap2.end(i); ++it)
            cout << "[" << it->first << ":" << it->second << "] ";
          cout << endl;
      }

  return 0;
}

Output:

max_load factor 1 mymap has 7 buckets.
bucket #0 contains: 
bucket #1 contains: [book:livre] 
bucket #2 contains: [tree:arbre] 
bucket #3 contains: [house:maison] [grapefruit:pamplemousse] 
bucket #4 contains: 
bucket #5 contains: [door:porte] 
bucket #6 contains: [apple:pomme] 
like image 276
JobHunter69 Avatar asked Jan 29 '21 06:01

JobHunter69


People also ask

How much space does unordered_map take?

In libc++ sizeof map is 24 whereas sizeof unordered_map is 40.

What is default size of unordered_map?

By default, unordered_map containers have a max_load_factor of 1.0.

What are buckets in unordered_map?

C++ Unordered_map Library - bucket() Function Bucket is a memory space in the container's hash table to which elements are assigned based on the hash value of their key. Valid range of buckets is from 0 to bucket_count - 1.

What is map reserve ()?

The unordered_map::reserve() function is used to reserve the number of buckets in the unordered map to store at least n elements inside the map. If the bucket count cannot hold n elements, then the bucket count is increased and a rehashing occurs.

Why does the unordered_map container have a reserve method?

The unordered_map container has a reserve method because it is implemented using buckets, and not a tree as in map. a slot in the container's internal hash table to which elements are assigned based on the hash value of their key. Buckets are numbered from 0 to (bucket_count-1). ( source) A single bucket holds a variable number of items.

What is the return value of an unordered map?

Return Value: It returns the number of the element present in the unordered map. Constant i.e. O (1).

What determines the number of buckets in a container?

The number of elements in a bucket influences the time it takes to access a particular element in the bucket. The container automatically increases the number of buckets to keep the load factor (which is the average bucket size) below its max_load_factor. Bucket number.

Is it worth it to re-map a map with buckets?

I would say yes for two reasons: (1) It pre-allocates the memory for the buckets, and (2) it can prevent the need of a rehash, which as discussed earlier completely rebuilds the map.


2 Answers

The cplusplus.com website gives this explanation:

void reserve (size_type n);

Request a capacity change

Sets the number of buckets in the container (bucket_count) to the most appropriate to contain at least n elements.

If n is greater than the current bucket_count multiplied by the max_load_factor, the container's bucket_count is increased and a rehash is forced.

If n is lower than that, the function may have no effect.

At the time you declare your unordered_map variable, it has a bucket_count of 1 and a max_load_factor of 1. Then you reserve 6 buckets which is greater than max_load_factor multiplied by bucket_count

According to this definition, the behavior is, in my humble opinion, correct.

I added at line 17 of your code the following line to show the bucket_count before the reserveand indeed, it is 1

 std::cout << "BEFORE RESERVE max_load factor " << mymap2.max_load_factor() << " mymap has " << mymap2.bucket_count() << " buckets.\n";

The display is as follows:

BEFORE RESERVE max_load factor 1 mymap has 1 buckets.

After the reserve:

AFTER RESERVE max_load factor 1 mymap has 7 buckets.

Thus the behavior is normal in my humble opinion.

like image 132
Pat. ANDRIA Avatar answered Nov 09 '22 04:11

Pat. ANDRIA


Hash table implementations tend to pick between two desirable qualities:

  • maintaining a power-of-two bucket_count() (i.e. rounding any value passed to reserve up to the next power of two if necessary), so

    • a size_t value returned by the hash function can be mapped onto the range of buckets using a 1-CPU-cycle bitwise AND operation (e.g. 8 buckets -> hash value ANDed with 7);

    • this has the undesirable effect of chopping off the high-order bits from the hash value, so they don't help randomise the bucket placement

    • Visual C++ does this

  • maintaining a prime bucket_count()

    • this has the extremely desirable side-effect of having high-order bits in the hash value affect the bucket selection, so lower-quality (i.e. faster) hash functions still often manage a more equal, less-collision/clustering-prone, bucket placement

    • implemented naively, this forces the compiler to do a mod ("%") operation by a runtime-variable bucket_count(), which may take e.g. 40-90 CPU cycles, depending on the CPU. A faster alternative is use the index into the table of prime numbers used when sizing the hash table to switch into a mod operation by that hard-coded constant prime value, so the compiler can try to optimise the mod using bit shifts, subtractions or additions, or multiplications if that's necessary (you can see the kinds of optimisations possible in this snippet on godbolt)

    • GCC does this; I think clang does too.

So, summarily - when you ask for 6 buckets, GCC or clang will increase that to some prime - not necessarily the next one, but it appears that's happened in this case - to reduce the collision-proneness when you later insert elements.

like image 41
Tony Delroy Avatar answered Nov 09 '22 03:11

Tony Delroy