Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove duplicates from a list then sort by most frequent

Tags:

scala

I have a list with assorted keywords that may repeat. I need to generate a list with distinct keywords but sorted by the frequency of which they appeared on the original list.

How would be the idiomatic Scala for that? Here is a working but ugly implementation:

val keys = List("c","a","b","b","a","a")
keys.groupBy(p => p).toList.sortWith( (a,b) => a._2.size > b._2.size ).map(_._1)
// List("a","b","c")
like image 264
Johnny Everson Avatar asked Dec 02 '25 06:12

Johnny Everson


2 Answers

Shorter version:

keys.distinct.sortBy(keys count _.==).reverse

That is not particular efficient, however. The groupBy version ought to perform better, though it can be improved:

keys.groupBy(identity).toSeq.sortBy(_._2.size).map(_._1)

One can also get rid of the reverse in the first version by declaring an Ordering:

val ord = Ordering by (keys count (_: String).==)
keys.distinct.sorted(ord.reverse)

Note that reverse in this version just produces a new Ordering that works in the opposite manner of the original. This version also suggests a way to get better performance:

val freq = collection.mutable.Map.empty[String, Int] withDefaultValue 0
keys foreach (k => freq(k) += 1)
val ord = Ordering by freq
keys.distinct.sorted(ord.reverse)
like image 87
Daniel C. Sobral Avatar answered Dec 04 '25 22:12

Daniel C. Sobral


Nothing wrong with that implementation that comments can't fix! Seriously, break it down a bit and describe what & why you're taking each step.

Not as "concise" perhaps, but the purpose of concise code in scala is to make code more readable. When concise code is not clear it's time to back up, break up (introduce well named local variables), and comment.

like image 35
Richard Sitze Avatar answered Dec 04 '25 22:12

Richard Sitze