Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why HashMap insert new Node on index (n - 1) & hash?

Why HashMap insert new Node on the index: tab[(n - 1) & hash]

Where hash = key.hashCode() ^ key.hashCode() >>> 16 And n = tab.length of array of Node<K,V>.

Why HashMap not put the Node just like that: tab[hash] ? Is it just another hashing function, like multiplication by 31 in most of hashCode() methods ? Thanks in advance for explanation!

like image 616
Ziemo Avatar asked Dec 01 '14 14:12

Ziemo


2 Answers

A good description by harold but I feel it is inadequate without an example. So heres one -

Whenever a new Hasmap is created the array size of internal Node[] table is always power of 2 and following method guarantees it -

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

So lets say you provide initial capacity as 5

cap = 5
n = cap - 1 =  4 = 0 1 0 0
n |= n >>> 1;    0 1 0 0 | 0 0 1 0 = 0 1 1 0 = 6
n |= n >>> 2;    0 0 1 1 | 0 1 1 0 = 0 1 1 1 = 7
n |= n >>> 4;    0 0 0 0 | 0 1 1 1 = 0 1 1 1 = 7
n |= n >>> 8;    0 0 0 0 | 0 1 1 1 = 0 1 1 1 = 7
n |= n >>> 16;   0 0 0 0 | 0 1 1 1 = 0 1 1 1 = 7
return n + 1     7 + 1 = 8 

So table size is 8 = 2^3

Now possible index values you can put your element in map are 0-7 since table size is 8. Now lets look at put method. It looks for bucket index as follows -

Node<K,V> p =  tab[i = (n - 1) & hash];

where n is the array size. So n = 8. It is same as saying

Node<K,V> p =  tab[i = hash % n];

So all we need to see now is how

hash % n == (n - 1) & hash

Lets again take an example. Lets say hash of a value is 10.

hash = 10
hash % n = 10 % 8 = 2
(n - 1) & hash = 7 & 10 = 0 1 1 1 & 1 0 1 0 = 0 0 1 0 = 2

Hope this helps. More details

PS: Above link goes to my blog that has a more details example explanation on this.

like image 130
Aniket Thakur Avatar answered Nov 04 '22 02:11

Aniket Thakur


Because hash may be out of range.

The "canonical solution" is to take the (positive) modulo of the hash with the length of the array, this code uses the fact that the array has a power-of-two length to replace an expensive modulo by a variable (modulo a constant is optimized pretty well) with a cheap bitwise AND.

like image 40
harold Avatar answered Nov 04 '22 02:11

harold