I can see that from Scala documentation scala.collection.immutable.Set is only a trait. Which one on the Set implementation is used by default ? HashSet or TreeSet (or something else) ?
I would like to know/plan the running time of certain functions.
Example:
scala> val s = Set(1,3,6,2,7,1)
res0: scala.collection.immutable.Set[Int] = Set(1, 6, 2, 7, 3)
What would be the running time of s.find(5), O(1) or O(log(n)) ?
Since same apply for Map, what is the best way to figure this out ?
The default representation of a SortedSet is an ordered binary tree which maintains the invariant that all elements in the left subtree of a node are smaller than all elements in the right subtree. That way, a simple in order traversal can return all tree elements in increasing order. Scala's class immutable.
Some Important Points about Set in ScalaBy default set in Scala are immutable. In Scala, the immutable set is defined under Scala. collection.
In this case it means "add these to the end." Also note that if a class defines ++ but not ++= then the compiler will treat x ++= y. as x = x ++ y. this is true generally for symbols ending in an equal sign (other than == , != , and = , of course).
By looking at the source code, you can find that sets up to four elements have an optimized implementation provided by EmptySet
, Set1
, Set2
, Set3
and Set4
, which simply hold the single values.
For example here's Set2
declaration (as of scala 2.11.4):
class Set2[A] private[collection] (elem1: A, elem2: A) extends AbstractSet[A] with Set[A] with Serializable
And here's the contains
implementation:
def contains(elem: A): Boolean =
elem == elem1 || elem == elem2
or the find
implementation
override def find(f: A => Boolean): Option[A] = {
if (f(elem1)) Some(elem1)
else if (f(elem2)) Some(elem2)
else None
}
Very straightforward.
For sets with more than 4 elements, the underlying implementation is an HashSet
. We can easily verify this in the REPL:
scala> Set(1, 2, 3, 4).getClass
res1: Class[_ <: scala.collection.immutable.Set[Int]] = class scala.collection.immutable.Set$Set4
scala> Set(1, 2, 3, 4, 5, 6).getClass
res0: Class[_ <: scala.collection.immutable.Set[Int]] = class scala.collection.immutable.HashSet$HashTrieSet
That being said, find
must always iterate over the whole HashSet
, since it's unsorted, so it will be O(n)
.
Conversely, a lookup operation like contains
will be O(1)
instead.
Here's a more in-depth reference about performance of scala collections in general.
Speaking of Map
, pretty much the same concepts apply. There are optimized Map
implementations up to 4 elements, and then it's an HashMap
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With