If I want to get the unique elements of in a List I can either do a distinct
or call toSet.toList
. Which is more efficient and why ? Is there any other efficient way of doing this ? My understanding is that distinct
will also maintain the order whereas toSet.toList
won't.
scala> val mylist = List(1,2,3,3,4,4,4,5,6,6,6,6,7)
mylist: List[Int] = List(1, 2, 3, 3, 4, 4, 4, 5, 6, 6, 6, 6, 7)
scala> mylist.distinct
res11: List[Int] = List(1, 2, 3, 4, 5, 6, 7)
scala> mylist.toSet.toList
res12: List[Int] = List(5, 1, 6, 2, 7, 3, 4)
Taken directly from the source code found here:
/** Builds a new $coll from this $coll without any duplicate elements.
* $willNotTerminateInf
*
* @return A new $coll which contains the first occurrence of every element of this $coll.
*/
def distinct: Repr = {
val b = newBuilder
val seen = mutable.HashSet[A]()
for (x <- this) {
if (!seen(x)) {
b += x
seen += x
}
}
b.result
}
So it appears that if order preservation is important, use distinct
otherwise, they're relatively just as expensive.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With