Let's assume we have some code
class WrongHashCode{
public int code=0;
@Override
public int hashCode(){
return code;
}
}
public class Rehashing {
public static void main(String[] args) {
//Initial capacity is 2 and load factor 75%
HashMap<WrongHashCode,String> hashMap=new HashMap<>(2,0.75f);
WrongHashCode wrongHashCode=new WrongHashCode();
//put object to be lost
hashMap.put(wrongHashCode,"Test1");
//Change hashcode of same Key object
wrongHashCode.code++;
//Resizing hashMap involved 'cause load factor barrier
hashMap.put(wrongHashCode,"Test2");
//Always 2
System.out.println("Keys count " + hashMap.keySet().size());
}
}
So, my question is why after resizing hashMap (that, as far, as I understand involves rehashing keys), we still have 2 keys in keySet instead of 1 (since key object is same for both existing KV pairs) ?
So, my question is why after resizing hashMap (that, as far, as I understand involves rehashing keys)
It actually does not involve rehashing keys – at least not in the HashMap
code except in certain circumstances (see below). It involves repositioning them in the map buckets. Inside of HashMap
is a Entry
class which has the following fields:
final K key;
V value;
Entry<K,V> next;
int hash;
The hash
field is the stored hashcode for the key that is calculated when the put(...)
call is made. This means that if you change the hashcode in your object it will not affect the entry in the HashMap unless you re-put it into the map. Of course if you change the hashcode for a key you won't be even able to find it in the HashMap
because it has a different hashcode as the stored hash entry.
we still have 2 keys in keySet instead of 1 (since key object is same for both existing KV pairs) ?
So even though you've changed the hash for the single object, it is in the map with 2 entries with different hash fields in it.
All that said, there is code inside of HashMap
which may rehash the keys when a HashMap is resized – see the package protected HashMap.transfer(...)
method in jdk 7 (at least). This is why the hash
field above is not final
. It is only used however when initHashSeedAsNeeded(...)
returns true to use "alternative hashing". The following sets the threshold of number of entries where the alt-hashing is enabled:
-Djdk.map.althashing.threshold=1
With this set on the VM, I'm actually able to get the hashcode()
to be called again when the resizing happens but I'm not able to get the 2nd put(...)
to be seen as an overwrite. Part of the problem is that the HashMap.hash(...)
method is doing an XOR with the internal hashseed
which is changed when the resizing happens, but after the put(...)
records the new hash code for the incoming entry.
The HashMap actually caches the hashCode for each key (as a key's hashCode may be expensive to compute). So, although you changed the hashCode for an existing key, the Entry to which it is linked in the HashMap still has the old code (and hence gets put in the "wrong" bucket after resize).
You can see this for yourself in the jvm code for HashMap.resize() (or a little easier to see in the java 6 code HashMap.transfer()).
I can't tell why two of the answers rely on HashMap.tranfer
for some example, when that method is not present in java-8 at all. As such I will provide my small input taking java-8 in consideration.
Entries in a HashMap
are indeed re-hashed, but not in the sense you might think they do. A re-hash is basically re-computing the already provided (by you) of the Key#hashcode
; there is a method for that:
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
So basically when you compute your hashcode, HashMap
will basically say - "i don't trust you enough" and it will re-hash your hashcode and potentially spread the bits better (it's actually a XOR
of the first 16 bits and last 16 bits).
On the other hand when HashMap
is re-sized it actually means that number of bins/buckets is doubled in size; and because bins are always a power of two - that means that an entry from a current bin will: potential stay in the same bucket OR move to a bucket that is at the offset at the current number of bins. You can find a bit of details how this is done in this question.
So once a re-size happens, there is no extra re-hashing; actually one more bit is taken into consideration and thus an entry might move or stay where it is. And Gray's answer is correct in this sense, that each Entry
has the hash
field, that is computed only once - the first time you put that Entry
.
I can't find it clearly documented, but changing a key value in a way that changes its hashCode()
typically breaks a HashMap
.
HashMap
divides entries amongst b buckets. You can imagine key with hash h
is assigned to bucket h%b
.
When it receives a new entry it works out which bucket it belongs to then if an equal key already exists in that bucket. It finally adds it to the bucket removing any matched key.
By changing the hash-code the object wrongHashCode
will be (typically and here actually) directed to another bucket second time around and its first entry won't be found or removed.
In short, changing the hash of an already inserted key breaks the HashMap
and what you get after that is unpredictable but may result in (a) not finding a key or (b) find two or more equal keys.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With