I am looking at the implementation of HashMap
in Java and am stuck at one point.
How is the indexFor
function calculated?
static int indexFor(int h, int length) { return h & (length-1); }
Thanks
Default capacity of Hashmap is 2^4 = 16 buckets. Let say we have well implemented hashcode() method, which make sure that key-value pair will be well distributed across 16 buckets equally. So, If there are 16 items in hashmap, then good hashcode method will distribute 1 item in each bucket.
HashMap uses multiple buckets and each bucket points to a Singly Linked List where the entries (nodes) are stored. Once the bucket is identified by the hash function using hashcode, then hashCode is used to check if there is already a key with the same hashCode or not in the bucket(singly linked list).
HashMap stores elements in so-called buckets and the number of buckets is called capacity. When we put a value in the map, the key's hashCode() method is used to determine the bucket in which the value will be stored. To retrieve the value, HashMap calculates the bucket in the same way – using hashCode().
The hash itself is calculated by the hashCode()
method of the object you're trying to store.
What you see here is calculating the "bucket" to store the object based on the hash h
. Ideally, to evade collisions, you would have the same number of buckets as is the maximum achievable value of h
- but that could be too memory demanding. Therefore, you usually have a lower number of buckets with a danger of collisions.
If h
is, say, 1000, but you only have 512 buckets in your underlying array, you need to know where to put the object. Usually, a mod
operation on h
would be enough, but that's too slow. Given the internal property of HashMap
that the underlying array always has number of buckets equal to 2^n
, the Sun's engineers could use the idea of h & (length-1)
, it does a bitwise AND with a number consisting of all 1
's, practically reading only the n
lowest bits of the hash (which is the same as doing h mod 2^n
, only much faster).
Example:
hash h: 11 1110 1000 -- (1000 in decimal) length l: 10 0000 0000 -- ( 512 in decimal) (l-1): 01 1111 1111 -- ( 511 in decimal - it will always be all ONEs) h AND (l-1): 01 1110 1000 -- ( 488 in decimal which is a result of 1000 mod 512)
It's not calculating the hash, it's calculating the bucket.
The expression h & (length-1)
does a bit-wise AND
on h
using length-1
, which is like a bit-mask, to return only the low-order bits of h
, thereby making for a super-fast variant of h % length
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With