Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala default Set Implementation

Tags:

scala

I can see that from Scala documentation scala.collection.immutable.Set is only a trait. Which one on the Set implementation is used by default ? HashSet or TreeSet (or something else) ?

I would like to know/plan the running time of certain functions.

Example:

 scala> val s = Set(1,3,6,2,7,1)
 res0: scala.collection.immutable.Set[Int] = Set(1, 6, 2, 7, 3)
  • What would be the running time of s.find(5), O(1) or O(log(n)) ?

  • Since same apply for Map, what is the best way to figure this out ?

like image 995
Marc Simon Avatar asked Nov 13 '14 05:11

Marc Simon


People also ask

Is set ordered in Scala?

The default representation of a SortedSet is an ordered binary tree which maintains the invariant that all elements in the left subtree of a node are smaller than all elements in the right subtree. That way, a simple in order traversal can return all tree elements in increasing order. Scala's class immutable.

Is a set immutable in Scala?

Some Important Points about Set in ScalaBy default set in Scala are immutable. In Scala, the immutable set is defined under Scala. collection.

What does ++ mean in Scala?

In this case it means "add these to the end." Also note that if a class defines ++ but not ++= then the compiler will treat x ++= y. as x = x ++ y. this is true generally for symbols ending in an equal sign (other than == , != , and = , of course).


1 Answers

By looking at the source code, you can find that sets up to four elements have an optimized implementation provided by EmptySet, Set1, Set2, Set3 and Set4, which simply hold the single values.

For example here's Set2 declaration (as of scala 2.11.4):

class Set2[A] private[collection] (elem1: A, elem2: A) extends AbstractSet[A] with Set[A] with Serializable

And here's the contains implementation:

def contains(elem: A): Boolean =
  elem == elem1 || elem == elem2

or the find implementation

override def find(f: A => Boolean): Option[A] = {
  if (f(elem1)) Some(elem1)
  else if (f(elem2)) Some(elem2)
  else None
}

Very straightforward.

For sets with more than 4 elements, the underlying implementation is an HashSet. We can easily verify this in the REPL:

scala> Set(1, 2, 3, 4).getClass
res1: Class[_ <: scala.collection.immutable.Set[Int]] = class scala.collection.immutable.Set$Set4

scala> Set(1, 2, 3, 4, 5, 6).getClass
res0: Class[_ <: scala.collection.immutable.Set[Int]] = class scala.collection.immutable.HashSet$HashTrieSet

That being said, find must always iterate over the whole HashSet, since it's unsorted, so it will be O(n). Conversely, a lookup operation like contains will be O(1) instead.

Here's a more in-depth reference about performance of scala collections in general.

Speaking of Map, pretty much the same concepts apply. There are optimized Map implementations up to 4 elements, and then it's an HashMap.

like image 54
Gabriele Petronella Avatar answered Oct 27 '22 18:10

Gabriele Petronella