Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java HashMap resizing

Tags:

java

hashmap

Let's assume we have some code

class WrongHashCode{
    public int code=0;

    @Override
    public int hashCode(){
        return code;
    }
}
public class Rehashing {
    public static void main(String[] args) {

        //Initial capacity is 2 and load factor 75%
        HashMap<WrongHashCode,String> hashMap=new HashMap<>(2,0.75f);

        WrongHashCode wrongHashCode=new WrongHashCode();
        //put object to be lost
        hashMap.put(wrongHashCode,"Test1");

        //Change hashcode of same Key object
        wrongHashCode.code++;

        //Resizing hashMap involved 'cause load factor barrier
        hashMap.put(wrongHashCode,"Test2");

        //Always 2
        System.out.println("Keys count " + hashMap.keySet().size());
    }
}

So, my question is why after resizing hashMap (that, as far, as I understand involves rehashing keys), we still have 2 keys in keySet instead of 1 (since key object is same for both existing KV pairs) ?

like image 353
Vladyslav Nikolaiev Avatar asked Sep 05 '17 13:09

Vladyslav Nikolaiev


4 Answers

So, my question is why after resizing hashMap (that, as far, as I understand involves rehashing keys)

It actually does not involve rehashing keys – at least not in the HashMap code except in certain circumstances (see below). It involves repositioning them in the map buckets. Inside of HashMap is a Entry class which has the following fields:

final K key;
V value;
Entry<K,V> next;
int hash;

The hash field is the stored hashcode for the key that is calculated when the put(...) call is made. This means that if you change the hashcode in your object it will not affect the entry in the HashMap unless you re-put it into the map. Of course if you change the hashcode for a key you won't be even able to find it in the HashMap because it has a different hashcode as the stored hash entry.

we still have 2 keys in keySet instead of 1 (since key object is same for both existing KV pairs) ?

So even though you've changed the hash for the single object, it is in the map with 2 entries with different hash fields in it.


All that said, there is code inside of HashMap which may rehash the keys when a HashMap is resized – see the package protected HashMap.transfer(...) method in jdk 7 (at least). This is why the hash field above is not final. It is only used however when initHashSeedAsNeeded(...) returns true to use "alternative hashing". The following sets the threshold of number of entries where the alt-hashing is enabled:

-Djdk.map.althashing.threshold=1

With this set on the VM, I'm actually able to get the hashcode() to be called again when the resizing happens but I'm not able to get the 2nd put(...) to be seen as an overwrite. Part of the problem is that the HashMap.hash(...) method is doing an XOR with the internal hashseed which is changed when the resizing happens, but after the put(...) records the new hash code for the incoming entry.

like image 94
Gray Avatar answered Oct 18 '22 00:10

Gray


The HashMap actually caches the hashCode for each key (as a key's hashCode may be expensive to compute). So, although you changed the hashCode for an existing key, the Entry to which it is linked in the HashMap still has the old code (and hence gets put in the "wrong" bucket after resize).

You can see this for yourself in the jvm code for HashMap.resize() (or a little easier to see in the java 6 code HashMap.transfer()).

like image 31
jtahlborn Avatar answered Oct 18 '22 01:10

jtahlborn


I can't tell why two of the answers rely on HashMap.tranfer for some example, when that method is not present in java-8 at all. As such I will provide my small input taking java-8 in consideration.

Entries in a HashMap are indeed re-hashed, but not in the sense you might think they do. A re-hash is basically re-computing the already provided (by you) of the Key#hashcode; there is a method for that:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

So basically when you compute your hashcode, HashMap will basically say - "i don't trust you enough" and it will re-hash your hashcode and potentially spread the bits better (it's actually a XOR of the first 16 bits and last 16 bits).

On the other hand when HashMap is re-sized it actually means that number of bins/buckets is doubled in size; and because bins are always a power of two - that means that an entry from a current bin will: potential stay in the same bucket OR move to a bucket that is at the offset at the current number of bins. You can find a bit of details how this is done in this question.

So once a re-size happens, there is no extra re-hashing; actually one more bit is taken into consideration and thus an entry might move or stay where it is. And Gray's answer is correct in this sense, that each Entry has the hash field, that is computed only once - the first time you put that Entry.

like image 5
Eugene Avatar answered Oct 18 '22 02:10

Eugene


I can't find it clearly documented, but changing a key value in a way that changes its hashCode() typically breaks a HashMap.

HashMap divides entries amongst b buckets. You can imagine key with hash h is assigned to bucket h%b. When it receives a new entry it works out which bucket it belongs to then if an equal key already exists in that bucket. It finally adds it to the bucket removing any matched key.

By changing the hash-code the object wrongHashCode will be (typically and here actually) directed to another bucket second time around and its first entry won't be found or removed.

In short, changing the hash of an already inserted key breaks the HashMap and what you get after that is unpredictable but may result in (a) not finding a key or (b) find two or more equal keys.

like image 2
Persixty Avatar answered Oct 18 '22 01:10

Persixty