Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why HashMap resize In case of collision or worst case

I am asking this question with respect to java version till 1.7 only. I am using reflection to find out current capacity of HashMap. In below program is putting 12 unique person into a single bucket of HashMap (using same hashcode). Then i am putting 13th unique person on same or different bucket(using same or different hashcodes). In both cases after adding this 13th element, HashMap resizes to 32 buckets. I understand that due to load factor .75 and initial capacity 16 HashMap resizes to its double with 13th element. But there are still empty buckets available and only 2 buckets are used for these 13th elements.

My questions are:

  1. Is my understanding correct. Am I not making any mistake. Is this the expected behavior of HashMap?

  2. If all this is correct then even though there are 12 or 11 free buckets why the need to double the HashMap with 13th element in this case. Isn't it extra overhead or costly to resize the HashMap? What is the need to double the HashMap in this case While 13th can be put in any available bucket according to hashcode?

.

public class HashMapTest {
    public static void main(String[] args)
            throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException {
        HashMap<Person, String> hm = new HashMap<Person, String>();
        for (int i = 1; i <= 12; i++) {
            // 12 Entry in same bucket(linkedlist)
            hm.put(new Person(), "1");
        }
        System.out.println("Number of Buckets in HashMap : " + bucketCount(hm));
        System.out.println("Number of Entry in HashMap :  " + hm.size());
        System.out.println("**********************************");
        // 13th element in different bucket
        hm.put(new Person(2), "2");
        System.out.println("Number of Buckets in HashMap : " + bucketCount(hm));
        System.out.println("Number of Entry in HashMap :  " + hm.size());
    }

    public static int bucketCount(HashMap<Person, String> h)
            throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException {
        Field tableField = HashMap.class.getDeclaredField("table");
        tableField.setAccessible(true);
        Object[] table = (Object[]) tableField.get(h);
        return table == null ? 0 : table.length;
    }
}

class Person {
    int age = 0;

    Person() {
    }

    Person(int a) {
        age = a;
    }

    @Override
    public boolean equals(Object obj) {
        return false;
    }

    @Override
    public int hashCode() {
        if (age != 0) {
            return 1;
        } else {
            return age;
        }
    }
}

OUTPUT

Number of Buckets in HashMap : 16
Number of Entry in HashMap :  12
**********************************
Number of Buckets in HashMap : 32
Number of Entry in HashMap :  13
like image 304
Pradeep Singh Avatar asked Dec 18 '22 06:12

Pradeep Singh


2 Answers

  1. yes, and this is the expected behavior.
  2. The HashMap doesn't care about how many buckets are used. It only knows that the load factor has been reached, and that the probability of having collisions is thus becoming too big, and the map should thus be resized. Even though many collisions already happened, resizing the map could actually fix that. Not in your case, since you chose identical hashCode on purpose, but in a more realistic case, hashCodes should have a much better distribution. HashMap can't do anything to make itself efficient if you choose bad hashCodes on purpose, and there is no point in adding complexity to handle an extreme case, that should never happen, and that HashMap won't be able to fix anyway.
like image 77
JB Nizet Avatar answered Jan 10 '23 06:01

JB Nizet


Yes, the behavior you observe is the expected behavior.

The implementation of HashMap expects you to use a reasonable hashCode for the keys. It assumes that your hashCode would distribute the keys as evenly as possible among the available buckets. If you fail to do that (as you did in your example - where all the keys have the same hashCode), you will get bad performance.

Under the assumption of even distribution, it makes sense for the HashMap to double its size once you pass the load factor. It doesn't check how many buckets are actually empty (since it has no way of knowing if new entries will be assigned to empty buckets or to occupied buckets). It just checks the average number of entries per bucket. Once that number exceeds the load factor, the number of buckets is doubled.

like image 35
Eran Avatar answered Jan 10 '23 07:01

Eran