Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Idiomatic way to reduce a list of pairs to a map of keys and their aggregated count?

Tags:

scala

I'm trying to recreate Hadoop's word count map / reduce logic in a simple Scala program for learning

This is what I have so far

val words1 = "Hello World Bye World"      
val words2 = "Hello Hadoop Goodbye Hadoop"

val input = List(words1,words2)           
val mapped = input.flatMap(line=>line.split(" ").map(word=>word->1))
    //> mapped  : List[(String, Int)] = List((Hello,1), (World,1), (Bye,1), 
    //                                       (World,1), (Hello,1), (Hadoop,1), 
    //                                       (Goodbye,1), (Hadoop,1))

mapped.foldLeft(Map[String,Int]())((sofar,item)=>{
    if(sofar.contains(item._1)){
        sofar.updated(item._1, item._2 + sofar(item._1))
    }else{
        sofar + item
    }
})                              
    //>Map(Goodbye -> 1, Hello -> 2, Bye -> 1, Hadoop -> 2, World -> 2)

This seems to work, but I'm sure there is a more idiomatic way to handle the reduce part (foldLeft)

I was thinking about perhaps a multimap, but I have a feeling Scala has a way to do this easily

Is there? e.g. a way to add to a map, and if the key exists, instead of replacing it, adding the value to the existing value. I'm sure I've seen this quesion somewhere, but couldn't find it and neither the answer.

I know groupBy is the way to do it probably in the real world, but I'm trying to implement it as close as possible to the original map/reduce logic in the link above.

like image 939
Eran Medan Avatar asked Dec 13 '12 21:12

Eran Medan


People also ask

How do I convert a list to map in Scala?

To convert a list into a map in Scala, we use the toMap method. We must remember that a map contains a pair of values, i.e., key-value pair, whereas a list contains only single values. So we have two ways to do so: Using the zipWithIndex method to add indices as the keys to the list.

What is Scala map function?

Advertisements. map() method is a member of TraversableLike trait, it is used to run a predicate method on each elements of a collection. It returns a new collection.

How do you make a map in Scala?

Map class explicitly. If you want to use both mutable and immutable Maps in the same, then you can continue to refer to the immutable Map as Map but you can refer to the mutable set as mutable. Map. While defining empty map, the type annotation is necessary as the system needs to assign a concrete type to variable.


1 Answers

You can use Scalaz's |+| operator because Maps are part of the Semigroup typeclass:

The |+| operator is the Monoid mappend function (a Monoid is any "thing" that can be "added" together. Many things can be added together like this: Strings, Ints, Maps, Lists, Options etc. An example:

scala> import scalaz._
import scalaz._

scala> import Scalaz._
import Scalaz._

scala> val map1 = Map(1 -> 3 , 2 -> 4)
map1: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3, 2 -> 4)

scala> val map2 = Map(1 -> 1, 3 -> 6)
map2: scala.collection.immutable.Map[Int,Int] = Map(1 -> 1, 3 -> 6)

scala> map1 |+| map2
res2: scala.collection.immutable.Map[Int,Int] = Map(1 -> 4, 3 -> 6, 2 -> 4)

So in your case, rather then create a List[(String,Int)], create a List[Map[String,Int]], and then sum them:

val mapped = input.flatMap(_.split(" ").map(word => Map(word -> 1)))
mapped.suml
like image 143
Dominic Bou-Samra Avatar answered Oct 09 '22 09:10

Dominic Bou-Samra