Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to see the distribution of keys in a HashMap?

Tags:

java

hashmap

When using a hash map, it's important to evenly distribute the keys over the buckets.

If all keys end up in the same bucket, you essentially end up with a list.

Is there a way to "audit" a HashMap in Java in order to see how well the keys are distributed?

I tried subtyping it and iterating Entry<K,V>[] table, but it's not visible.

like image 829
wvdz Avatar asked Apr 11 '15 18:04

wvdz


People also ask

How do you check if a HashMap contains a key?

HashMap containsKey() Method in Java HashMap. containsKey() method is used to check whether a particular key is being mapped into the HashMap or not. It takes the key element as a parameter and returns True if that element is mapped in the map.

How are HashMaps organized?

A HashMap has number of buckets (implemented as an array) in which to store entries. When an item is added to the map, it is assigned to a buckets based on a value derived of its hashCode and the bucket size of the HashMap . (Note that it's possible that the bucket is already occupied, which is called a collision.

Does HashMap store keys in order?

Solution #1: Sorting a HashMap Using a LinkedHashMap As we know, the key-value pairs stored in a HashMap do not have an order.


2 Answers

I tried subtyping it and iterating Entry[] table, but it's not visible

Use Reflection API!

public class Main {
    //This is to simulate instances which are not equal but go to the same bucket.
    static class A {
            @Override
            public boolean equals(Object obj) { return false;}

            @Override
            public int hashCode() {return 42; }
        }

    public static void main(String[] args) {
            //Test data  
            HashMap<A, String> map = new HashMap<A, String>(4);
            map.put(new A(), "abc");
            map.put(new A(), "def");

            //Access to the internal table  
            Class clazz = map.getClass();
            Field table = clazz.getDeclaredField("table");
            table.setAccessible(true);
            Map.Entry<Integer, String>[] realTable = (Map.Entry<Integer, String>[]) table.get(map);

            //Iterate and do pretty printing
            for (int i = 0; i < realTable.length; i++) {
                System.out.println(String.format("Bucket : %d, Entry: %s", i, bucketToString(realTable[i])));
            }
    }

    private static String bucketToString(Map.Entry<Integer, String> entry) throws Exception {
            if (entry == null) return null;
            StringBuilder sb = new StringBuilder();

            //Access to the "next" filed of HashMap$Node
            Class clazz = entry.getClass();
            Field next = clazz.getDeclaredField("next");
            next.setAccessible(true); 

            //going through the bucket
            while (entry != null) {
                sb.append(entry);
                entry = (Map.Entry<Integer, String>) next.get(entry);
                if (null != entry) sb.append(" -> ");
            }
            return sb.toString();
        }
}

In the end you'll see something like this in STDOUT:

 Bucket : 0, Entry: null 
 Bucket : 1, Entry: null 
 Bucket : 2, Entry: Main$A@2a=abc -> Main$A@2a=def 
 Bucket : 3, Entry: null
like image 166
Andrey Taptunov Avatar answered Nov 15 '22 17:11

Andrey Taptunov


HashMap uses the keys produced by the hashCode() method of your key objects, so I guess you are really asking how evenly distributed those hash code values are. You can get hold of the key objects using Map.keySet().

Now, the OpenJDK and Oracle implementations of HashMap do not use the key hash codes directly, but apply another hashing function to the provided hashes before distributing them over the buckets. But you should not rely on or use this implementation detail. So you ought to ignore it. So you should just ensure that the hashCode() methods of your key values are well distributed.

Examining the actual hash codes of some sample key value objects is unlikely to tell you anything useful unless your hash cide method is very poor. You would be better doing a basic theoretical analysis of your hash code method. This is not as scary as it might sound. You may (indeed, have no choice but to do so) assume that the hash code methods of the supplied Java classes are well distributed. Then you just need a check that the means you use for combining the hash codes for your data members behaves well for the expected values of your data members. Only if your data members have values that are highly correlated in a peculiar way is this likely to be a problem.

like image 38
Raedwald Avatar answered Nov 15 '22 17:11

Raedwald