Suppose you have
val docs = List(List("one", "two"), List("two", "three"))
where e.g. List("one", "two") represents a document containing terms "one" and "two", and you want to build a map with the document frequency for every term, i.e. in this case
Map("one" -> 1, "two" -> 2, "three" -> 1)
How would you do that in Scala? (And in an efficient way, assuming a much larger dataset.)
My first Java-like thought is to use a mutable map:
val freqs = mutable.Map.empty[String,Int]
for (doc <- docs)
for (term <- doc)
freqs(term) = freqs.getOrElse(term, 0) + 1
which works well enough but I'm wondering how you could do that in a more "functional" way, without resorting to a mutable map?
Use the list. count() method of the built-in list class to get the number of occurrences of an item in the given list.
Scala Stack count() method with example In Scala Stack class , the count() method is utilized to count the number of elements in the stack that satisfies a given predicate. Return Type: It returns the count the number of elements in the stack that satisfies a given predicate.
The easiest way to count the number of occurrences in a Python list of a given item is to use the Python . count() method. The method is applied to a given list and takes a single argument. The argument passed into the method is counted and the number of occurrences of that item in the list is returned.
Try this:
scala> docs.flatten.groupBy(identity).mapValues(_.size)
res0: Map[String,Int] = Map(one -> 1, two -> 2, three -> 1)
If you are going to be accessing the counts many times, then you should avoid mapValues
since it is "lazy" and, thus, would recompute the size on every access. This version gives you the same result but won't require the recomputations:
docs.flatten.groupBy(identity).map(x => (x._1, x._2.size))
The identity
function just means x => x
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With