Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

number of hash buckets

Tags:

java

hashmap

In the HashMap documentation, it is mentioned that:

  • the initial capacity is simply the capacity at the time the hash table is created
  • the capacity is the number of buckets in the hash table.

Now suppose we have intial capacity of 16 (default), and if we keep adding elements to 100 nos, the capacity of hashmap is 100 * loadfactor.

Will the number of hash buckets is 100 or 16?

Edit:
From the solution I read: buckets are more than the elements added. Taking this as view point: so if we add Strings as key, we will get one element/bucket resulting in a lot of space consumption/complexity, is my understanding right ?

like image 413
lowLatency Avatar asked Apr 30 '12 06:04

lowLatency


2 Answers

In doc

When the number of entries in the hash table exceeds the product of the 
load factor and the current capacity, the capacity is roughly doubled by 
calling the rehash method.

threshold=product of the  load factor and the current capacity

Lets try.. initial size of hashmap is 16
and default load factor is 0.75 so 1st threshold is 12 so adding 12 element next capacity will be.. (16*2) =32
2st threshold is 24 so after adding 24th element next capacity will be (32*2)=64

and so on..

like image 160
Sumit Singh Avatar answered Oct 22 '22 00:10

Sumit Singh


Neither 100 nor 16 buckets. Most likely there will be 256 buckets, but this isn't guaranteed by the documentation.

From the updated documentation link:

The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

(emphasis mine)

So, if we ignore the word "approximately" above, we determine that whenever the hash table becomes 75% full (or whichever load factor you specify in the constructor), the number of hash buckets doubles. That means that the number of buckets doubles whenever you insert the 12th, 24th, 48th, and 96th elements, leaving a total of 256 buckets.

However, as I emphasized in the documentation snippet, the number is approximately twice the previous size, so it may not be exactly 256. In fact, if the second-to-last doubling is replaced with a slightly larger increase, the last doubling may never happen, so the final hash table may be as small as 134 buckets, or may be larger than 256 elements.

N.B. I arrived at the 134 number because it's the smallest integer N such that 0.75 * N > 100.

like image 20
Adam Mihalcin Avatar answered Oct 22 '22 01:10

Adam Mihalcin