I have recently been diving into Scala and (perhaps predictably) have spent quite a bit of time studying the immutable collections API in the Scala standard library.
I am writing an application that necessarily does many +/- operations on large sets. For this reason, I want to ensure that the implementation I choose is a so-called "persistent" data structure so that I avoid doing copy-on-write. I saw this answer by Martin Odersky, but it didn't really clear up the issue for me.
I wrote the following test code to compare the performance of ListSet and HashSet for add operations:
import scala.collection.immutable._
object TestListSet extends App {
var set = new ListSet[Int]
for(i <- 0 to 100000) {
set += i
}
}
object TestHashSet extends App {
var set = new HashSet[Int]
for(i <- 0 to 100000) {
set += i
}
}
Here is a rough runtime measurement of the HashSet:
$ time scala TestHashSet
real 0m0.955s
user 0m1.192s
sys 0m0.147s
And ListSet:
$ time scala TestListSet
real 0m30.516s
user 0m30.612s
sys 0m0.168s
Cons on a singly linked list is a constant-time operation, but this performance looks linear or worse. Is this performance hit related to the need to check each element of the set for object equality to conform to the no-duplicates invariant of Set? If this is the case, I realize it's not related to "persistence".
As for official documentation, all I could find was the following page, but it seems incomplete: Scala 2.8 Collections API -- Performance Characteristics. Since ListSet seems initially to be a good choice for its memory footprint, maybe there should be some information about its performance in the API docs.
In Scala, both mutable and immutable sets are available. Mutable set is those set in which the value of the object is change but, in the immutable set, the value of the object is not changed itself. By default set in Scala are immutable. In Scala, the immutable set is defined under Scala.
Immutable collections, by contrast, never change. You have still operations that simulate additions, removals, or updates, but those operations will in each case return a new collection and leave the old collection unchanged. All collection classes are found in the package scala.
Seq . The 2.13 default Seq is specifically immutable, which has more constraints on it that scala. collection.
An old question but also a good example of conclusions being drawn on the wrong foundation.
Connor, basically you're trying to do a microbenchmark. That is generally not recommended and damn hard to do properly.
Why? Because the JVM is doing many other things than executing the code in your examples. It's loading classes, doing garbage collection, compiling bytecode to native code, etc. All dynamically and based on different metrics sampled at runtime.
So you cannot conclude anything about the performance of the two collections with the above test code. For example, what you could actually be measuring could be the compilation time of the +=
method of HashSet
and garbage collection times of ListSet
. So it's a comparison between apples and pears.
To do a micro benchmark properly, you should:
+=
method).-XX:-PrintCompilation
and -XX:-PrintGC
). If either runs during the test, discard the result.I can recommend reading Oracle's recommendations for doing micro benchmarks and a great article about benchmark pitfalls by Brian Goetz.
Also, if you want to use a good tool, which does all the above for you, try Caliper by Google.
Since a set has to have with no duplicates, before adding an element, a Set must check to see if it already contains the element. This search in a list that has no guarantee of an element's position will be O(N) linear time. The same general idea applies to its remove operation.
With a HashSet, the class defines a function that picks a location for any element in O(1), which makes the contains(element) method much quicker, at the expense of taking up more space to decrease the chance of element location collisions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With