Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Scala hashmap slow?

And what can be done about it?

I have run some tests and it seems that Scala Hashmap is much slower than a Java HashMap. Please prove me wrong!

For me the whole point of Hashmap is to get quick access to a value from a given key. So I find myself resorting to using a Java HashMap when speed matters, which is a bit sad. I'm not experienced enough to say for sure but it seems that the more you mix Java and Scala the more problems you are likely to face.

test("that scala hashmap is slower than java") {     val javaMap = new util.HashMap[Int,Int](){       for (i <- 1 to 20)       put(i,i+1)     }      import collection.JavaConverters._     val scalaMap = javaMap.asScala.toMap      // check is a scala hashmap     assert(scalaMap.getClass.getSuperclass === classOf[scala.collection.immutable.HashMap[Int,Int]])      def slow = {       val start = System.nanoTime()       for (i <- 1 to 1000) {         for (i <- 1 to 20) {           scalaMap(i)         }       }       System.nanoTime() - start     }      def fast = {       val start = System.nanoTime()       for (i <- 1 to 1000) {         for (i <- 1 to 20) {           javaMap.get(i)         }       }       System.nanoTime() - start     }      val elapses: IndexedSeq[(Long, Long)] = {       (1 to 1000).map({_ => (slow,fast)})     }      var elapsedSlow = 0L     var elapsedFast = 0L     for ((eSlow,eFast) <- elapses) {       elapsedSlow += eSlow       elapsedFast += eFast     }      assert(elapsedSlow > elapsedFast)      val fraction : Double = elapsedFast.toDouble/elapsedSlow     println(s"slower by factor of: $fraction") } 

Am I missing something?

Answer Summary

As of now, when comparing Java 8 to Scala 2.11, it appears that Java HashMap is notably speedier at lookups (for a low number of keys) than the Scala offerings - with the exception of LongMap (if your keys are Ints/Longs).

The performance difference is not so great that it should matter in most use cases. Hopefully Scala will improve the speed of their Maps. In the mean time, if you need performance (with non-integer keys) use Java.

Int keys, n=20
Long(60), Java(93), Open(170), MutableSc(243), ImmutableSc(317)

case object keys, n=20
Java(195), AnyRef(230)

like image 518
MS-H Avatar asked Feb 26 '15 14:02

MS-H


People also ask

Why is HashMap slow?

HashMap's Bottleneck The bucket is actually a simple linked list. Finding elements in the linked list isn't very fast (O(n)) but that's not a problem if the list is very small. Problems start when we have a lot of hash code collisions, so instead of a big number of small buckets, we have a small number of big buckets.

How does HashMap work in Scala?

HashMap is a part of Scala Collection's. It is used to store element and return a map. A HashMap is a combination of key and value pairs which are stored using a Hash Table data structure. It provides the basic implementation of Map.

What is difference between MAP and HashMap in Scala?

Key Differences between Map and HashMapThe Map is an interface, and HashMap is a class of the Java collection framework. The Map interface can be implemented by using its implementing classes. In comparison, the HashMap class implements the Map interface. The Map contains unique key-pair values.


1 Answers

First of all: doing JVM benchmarks using nanoTime is extremely error-prone. Use a microbenchmarking framework such as Thyme, Caliper or JMH

Second: you are comparing a mutable java hash map with an immutable scala hash map. Immutable collections can be remarkably fast, but there are some cases where they will never be as fast as mutable data structures.

Here is a proper microbenchmark of mutable java hash map vs. immutable scala hash map: https://gist.github.com/rklaehn/26c277b2b5666ec4b372

As you can see, the scala immutable map is a bit faster than the java mutable map. Note that this will not be the case once you go to larger maps, because an immutable data structure has to do some compromises to enable structural sharing. I would guess that in both cases, the dominant performance issue is boxing of the ints to Integer.

Update: if you really want a mutable hash hap with ints as keys, the right choice from the scala collections library is scala.collection.mutable.LongMap. This uses a long as key, and has much better performance than the generic Map because it does not have to box the value. See results from the gist.

Update 2: If your key extends from AnyRef (like e.g. a String), your best bet for a high performance mutable map is scala.collection.mutable.AnyRefMap

like image 111
Rüdiger Klaehn Avatar answered Sep 29 '22 04:09

Rüdiger Klaehn