Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String IdentityHashMap vs HashMap performance

Identity HashMap is special implementation in Java which compares the objects reference instead of equals() and also uses identityHashCode() instead of hashCode(). In addition, it uses linear-probe hash table instead of Entry list.

Map<String, String> map = new HashMap<>(); 
Map<String, String> iMap = new IdentityHashMap<>();

Does that mean for the String keys IdentifyHashMap will be usually faster if tune correctly ?

See this example:

public class Dictionary {

    public static void main(String[] args) throws IOException {
        BufferedReader br = new BufferedReader(new FileReader("/usr/share/dict/words"));

        String line;
        ArrayList<String> list = new ArrayList<String>();

        while ((line = br.readLine()) != null) {
            list.add(line);
        }
        System.out.println("list.size() = " + list.size());
        Map<String, Integer> iMap = new IdentityHashMap<>(list.size());
        Map<String, Integer> hashMap = new HashMap<>(list.size());

        long iMapTime = 0, hashMapTime = 0;

        long time;
        for (int i = 0; i < list.size(); i++) {
            time = System.currentTimeMillis();
            iMap.put(list.get(i), i);
            time = System.currentTimeMillis() - time;
            iMapTime += time;
            time = System.currentTimeMillis();
            hashMap.put(list.get(i), i);
            time = System.currentTimeMillis() - time;
            hashMapTime += time;
        }

        System.out.println("iMapTime = " + iMapTime + " hashMapTime = " + hashMapTime);
    }

}

Tried very basic performance check. I am reading dictionary words (235K) & pushing into the both maps. It prints:

list.size() = 235886
iMapTime = 101 hashMapTime = 617 

I think this is very good improvment to ignore, unless I am doing something wrong here.

like image 801
Himanshu Ahire Avatar asked Apr 11 '15 15:04

Himanshu Ahire


2 Answers

How does IdentityHashMap<String,?> work?

To make IdentityHashMap<String,?> work for arbitrary strings, you'll have to String.intern() both the keys you put() and potential keys you pass to get(). (Or use an equivalent mechanism.)

Note: unlike stated in @m3th0dman's answer, you don't need to intern() the values.

Either way, interning a string ultimately requires looking it up in some kind of hash table of already interned strings. So unless you had to intern your strings for some other reason anyway (and thus already paid the cost), you won't get much of an actual performance boost out of this.

So why does the test show that you can?

Where your test is unrealistic is that you keep the exact list of keys you used with put() and you iterate across them one by one in list order. Note (the same could be achieved by inserting the elements into a LinkedHashMap and simply calling iterator() on its entry set.

What's the point of IdentityHashMap then?

There are scenarios where it is guaranteed (or practically guaranteed) that object identity is the same as equals(). Imagine trying to implement your own ThreadLocal class for example, you'll probably write something like this:

public final class ThreadLocal<T> {
   private final IdentityHashMap<Thread,T> valueMap;
   ...
   public T get() {
       return valueMap.get( Thread.currentThread() );
   }
}

Because you know that threads have no notion of equality beyond identity. Same goes if your map keys are enum values and so on.

like image 123
biziclop Avatar answered Sep 23 '22 14:09

biziclop


You will see significantly faster performance on IdentityHashMap, however that comes at a substantial cost.

You must be absolutely sure that you will never ever have objects added to the map that have the same value but different identities.

That's hard to guarantee both now and for the future, and a lot of people make mistaken assumptions.

For example

String t1 = "test";
String t2 = "test";

t1==t2 will return true.

String t1 = "test";
String t2 = new String("test");

t1==t2 will return false.

Overall my recommendation is that unless you absolutely critically need the performance boost and know exactly what you are doing and heavily lock down and comment access to the class then by using IdentityHashMap you are opening yourself up to massive risks of very hard to track down bugs in the future.

like image 26
Tim B Avatar answered Sep 23 '22 14:09

Tim B