Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala most efficient Map implementation

What is the best bet when a scala Map is created and used in a single thread? (Like the java analogous best bet on StringBuffer compared with StringBuilder for constructing strings).

The kind of constraints to choose the Map are:

  • the map is created once with multiple additions/updates on possible already existing keys
  • nothing is deleted from map - so this kind of operation might not be needed
  • map could have thousands of values (not so big)
  • the usage is from the same thread (so there is no concern about accessing/updating the map in parallel)
  • the signature of map might be Map[String,T]. In case there is a reason a Map[Int/Long,T] could be used.

I investigated

  • collection.immutable.Map (that initially works optimized for few keys)
  • collection.immutable.HashMap
  • collection.mutable.OpenHashMap
  • collection.mutable.HashMap

The test showed that there is no clear winner with 50000 keys. I found some

  • simple microbenchmarks comparing Scala vs Java mutable map performance - 201 that suggest that OpenHashMap might be a good bet
  • New collections in Scala 2.7.2 - 2008
  • the official scaladoc Performance Characteristics doesn't offer orientations except that all Map implementation take eC - effectively constant time for lookup and add.

Nevertheless the question is what is the safest bet in general on such circumstances and why?

like image 627
raisercostin Avatar asked Jan 09 '23 17:01

raisercostin


1 Answers

If your map is not extremely tiny and not huge, and your key is a String, then collection.mutable.AnyRefMap is a good bet. collection.mutable.LongMap is even faster if you can have Long keys. The reason they exist is precisely to be fast for common use cases.

If most maps are incredibly tiny (0-4 elements), then LinkedHashMap tends to be best because it avoids the overhead of a hash table. (Immutable maps are also not bad when 4 or fewer elements.)

If maps are really huge (100s of millions of key/value pairs) then the standard collection.mutable.HashMap is the way to go because performance degrades slightly more gracefully as you run out of space for separate keys.

like image 152
Rex Kerr Avatar answered Jan 14 '23 14:01

Rex Kerr