Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Java 7 HashMap in Java 8

I have updated a Java application to Java 8. The application heavily relies on HashMaps. When I run the benchmarks, I see unpredictable behavoir. For some inputs, the application runs faster than before, but for larger inputs, it's constantly slower.

I've checked the profiler and the most time consuming operation is HashMap.get. I suspect the changes are due to the HashMap modification in Java 8, but it may not be true, as I have changed some other parts.

Is there an easy way that I hook in the original Java 7 HashMap into my Java 8 application so that I only change the hashmap implementation to see if I still observe the change in performance.

The following is a minimal program that tries to simulate what my application is doing. The basic idea is that i need to share nodes in the application. At some runtime point, a node should be retrieved or created if it already does not exist based on some integer properties. The following only uses two integer, but in the real application I have one, two and three integer keys.

import java.util.HashMap;
import java.util.Map;
import java.util.Random;

public class Test1 {

static int max_k1 = 500;
static int max_k2 = 500;

static Map<Node, Node> map;
static Random random = new Random();

public static void main(String[] args) {
    for (int i = 0; i < 15; i++) {
        long start = System.nanoTime();
        run();
        long end = System.nanoTime();
        System.out.println((end - start) / 1000_000);
    }
}

private static void run() {
    map = new HashMap<>();
    for (int i = 0; i < 10_000_000; i++) {
        Node key = new Node(random.nextInt(max_k1), random.nextInt(max_k2));
        Node val = getOrElseUpdate(key);
    }
}

private static Node getOrElseUpdate(Node key) {
    Node val;
    if ((val = map.get(key)) == null) {
        val = key;
        map.put(key, val);
    }
    return val;
}

private static class Node {

    private int k1;
    private int k2;

    public Node(int k1, int k2) {
        this.k1 = k1;
        this.k2 = k2;
    }

    @Override
    public int hashCode() {
        int result = 17;
        result = 31 * result + k1;
        result = 31 * result + k2;
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;

        if (!(obj instanceof Node))
            return false;

        Node other = (Node) obj;

        return k1 == other.k1 && k2 == other.k2;
    }
  }
}

The benchmarking is primitive, but still, this is the result of 15 runs on Java 8:

8143
7919
7984
7973
7948
7984
7931
7992
8038
7975
7924
7995
6903
7758
7627

and this is for Java 7:

7247
6955
6510
6514
6577
6489
6510
6570
6497
6482
6540
6462
6514
4603
6270

The benchmarking is primitive, so I appreciate if someone who is familiar with JMH or other benchmarking tools run it, but from what I observe the results are better for Java 7. Any ideas?

like image 649
Wickoo Avatar asked Jan 03 '15 21:01

Wickoo


People also ask

How does HashMap & Collections differs from Java 7 & 8?

In Java 7 after calculating hash from hash function if more then one element has same hash than they are searched by linear search so it's complexity is (n). In Java 8 that search is performed by binary search so the complexity will become log(n).

How HashMap works internally in Java 8 with example?

In Java 8, HashMap replaces linked list with a binary tree when the number of elements in a bucket reaches certain threshold. While converting the list to binary tree, hashcode is used as a branching variable.

What is default size of HashMap?

Constructs an empty HashMap with the default initial capacity (16) and the default load factor (0.75).


1 Answers

Your hashCode() is very poor. In example you posted you have 250000 unique values but only 15969 unique hash codes. Because of lot of collisions, Java 8 swaps lists with trees. In your case it only adds overhead, because many elements not only have the same position in hash table but also the same hash code. The tree ends up as a linked list anyway.

There are couple of ways to fix this:

  1. Improve your hashCode. return k1 * 500 + k2; resolves the issue.

  2. Use THashMap. Open addressing should work better in case of collisions.

  3. Make Node implement Comparable. This will be used by HashMap to construct balanced tree in case of conflicts.

like image 63
Piotr Praszmo Avatar answered Oct 08 '22 01:10

Piotr Praszmo