Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala using toSet.toList vs distinct

If I want to get the unique elements of in a List I can either do a distinct or call toSet.toList. Which is more efficient and why ? Is there any other efficient way of doing this ? My understanding is that distinct will also maintain the order whereas toSet.toList won't.

scala> val mylist = List(1,2,3,3,4,4,4,5,6,6,6,6,7)
mylist: List[Int] = List(1, 2, 3, 3, 4, 4, 4, 5, 6, 6, 6, 6, 7)

scala> mylist.distinct
res11: List[Int] = List(1, 2, 3, 4, 5, 6, 7)

scala> mylist.toSet.toList
res12: List[Int] = List(5, 1, 6, 2, 7, 3, 4)
like image 293
Soumya Simanta Avatar asked May 08 '14 20:05

Soumya Simanta


1 Answers

Taken directly from the source code found here:

/** Builds a new $coll from this $coll without any duplicate elements.
* $willNotTerminateInf
*
* @return A new $coll which contains the first occurrence of every element of this $coll.
*/
  def distinct: Repr = {
    val b = newBuilder
    val seen = mutable.HashSet[A]()
    for (x <- this) {
      if (!seen(x)) {
        b += x
        seen += x
      }
    }
    b.result
  }

So it appears that if order preservation is important, use distinct otherwise, they're relatively just as expensive.

like image 105
wheaties Avatar answered Oct 04 '22 21:10

wheaties