Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the most frequent/common element in a collection?

Tags:

list

scala

What is the best way to find the most frequent/common element in a collection? For example:

list = List(1, 3, 4, 4, 2)
list.mostCommon   // => 4        !! This is what I want !!

Hmm.. What one could do is to do a groupBy first and then map them by length, and then select the biggest one. So then you would get:

Map(1 -> List(1), 4 -> List(4, 4), 3 -> List(3), 2 -> List(2))
(...)
Map(1 -> 1, 4 -> 2, 3 -> 1, 2 -> 1)  // mapped by length. 4 -> 2  since there's two 4s

And then in the end, choose the key (4) that maps to the highest number (2). (nested question: what is the best way to do that?). But that seems like a lot of work for such a simple operation..?

Is there a better / more idiomatic way to do this?

like image 285
kornfridge Avatar asked Dec 14 '12 11:12

kornfridge


1 Answers

I have to say that:

list.groupBy(identity).mapValues(_.size).maxBy(_._2)._1

Or just:

list.groupBy(identity).maxBy(_._2.size)._1

Doesn't really seem like that much work to me.

If you're worried about the overhead of building up the lists for each value when you only need counts, you could do the following:

list.foldLeft(Map.empty[Int, Int].withDefaultValue(0)) {
  case (m, v) => m.updated(v, m(v) + 1)
}.maxBy(_._2)._1

Or even keep track of the maximum as you go, to avoid the extra traversal at the end:

list.foldLeft(
  Map.empty[Int, Int].withDefaultValue(0), -1 -> Double.NegativeInfinity
) {
  case ((m, (maxV, maxCount)), v) =>
    val count = m(v) + 1
    if (count > maxCount) (m.updated(v, count), v -> count)
    else (m.updated(v, count), maxV -> maxCount)
}._2._1

This is obviously much less readable than the one-liners above, though, so I'd recommend sticking with them unless you can show (i.e., with benchmarking, not speculation) that they're a bottleneck in your application.

like image 142
Travis Brown Avatar answered Oct 25 '22 09:10

Travis Brown