Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count occurrences of each element in a List[List[T]] in Scala

Suppose you have

val docs = List(List("one", "two"), List("two", "three"))

where e.g. List("one", "two") represents a document containing terms "one" and "two", and you want to build a map with the document frequency for every term, i.e. in this case

Map("one" -> 1, "two" -> 2, "three" -> 1)

How would you do that in Scala? (And in an efficient way, assuming a much larger dataset.)

My first Java-like thought is to use a mutable map:

val freqs = mutable.Map.empty[String,Int]
for (doc <- docs)
  for (term <- doc)
    freqs(term) = freqs.getOrElse(term, 0) + 1

which works well enough but I'm wondering how you could do that in a more "functional" way, without resorting to a mutable map?

like image 820
Mirko N. Avatar asked Aug 28 '12 19:08

Mirko N.


People also ask

How do you count occurrences of an item in a list?

Use the list. count() method of the built-in list class to get the number of occurrences of an item in the given list.

How do you use count function in Scala?

Scala Stack count() method with example In Scala Stack class , the count() method is utilized to count the number of elements in the stack that satisfies a given predicate. Return Type: It returns the count the number of elements in the stack that satisfies a given predicate.

How do you count all occurrences in a list in Python?

The easiest way to count the number of occurrences in a Python list of a given item is to use the Python . count() method. The method is applied to a given list and takes a single argument. The argument passed into the method is counted and the number of occurrences of that item in the list is returned.


1 Answers

Try this:

scala> docs.flatten.groupBy(identity).mapValues(_.size)
res0: Map[String,Int] = Map(one -> 1, two -> 2, three -> 1)

If you are going to be accessing the counts many times, then you should avoid mapValues since it is "lazy" and, thus, would recompute the size on every access. This version gives you the same result but won't require the recomputations:

docs.flatten.groupBy(identity).map(x => (x._1, x._2.size))

The identity function just means x => x.

like image 56
dhg Avatar answered Oct 21 '22 15:10

dhg