I'm trying to recreate Hadoop's word count map / reduce logic in a simple Scala program for learning
This is what I have so far
val words1 = "Hello World Bye World"
val words2 = "Hello Hadoop Goodbye Hadoop"
val input = List(words1,words2)
val mapped = input.flatMap(line=>line.split(" ").map(word=>word->1))
//> mapped : List[(String, Int)] = List((Hello,1), (World,1), (Bye,1),
// (World,1), (Hello,1), (Hadoop,1),
// (Goodbye,1), (Hadoop,1))
mapped.foldLeft(Map[String,Int]())((sofar,item)=>{
if(sofar.contains(item._1)){
sofar.updated(item._1, item._2 + sofar(item._1))
}else{
sofar + item
}
})
//>Map(Goodbye -> 1, Hello -> 2, Bye -> 1, Hadoop -> 2, World -> 2)
This seems to work, but I'm sure there is a more idiomatic way to handle the reduce part (foldLeft)
I was thinking about perhaps a multimap, but I have a feeling Scala has a way to do this easily
Is there? e.g. a way to add to a map, and if the key exists, instead of replacing it, adding the value to the existing value. I'm sure I've seen this quesion somewhere, but couldn't find it and neither the answer.
I know groupBy
is the way to do it probably in the real world, but I'm trying to implement it as close as possible to the original map/reduce logic in the link above.
To convert a list into a map in Scala, we use the toMap method. We must remember that a map contains a pair of values, i.e., key-value pair, whereas a list contains only single values. So we have two ways to do so: Using the zipWithIndex method to add indices as the keys to the list.
Advertisements. map() method is a member of TraversableLike trait, it is used to run a predicate method on each elements of a collection. It returns a new collection.
Map class explicitly. If you want to use both mutable and immutable Maps in the same, then you can continue to refer to the immutable Map as Map but you can refer to the mutable set as mutable. Map. While defining empty map, the type annotation is necessary as the system needs to assign a concrete type to variable.
You can use Scalaz's |+|
operator because Maps
are part of the Semigroup typeclass:
The |+|
operator is the Monoid mappend
function (a Monoid is any "thing" that can be "added" together. Many things can be added together like this: Strings, Ints, Maps, Lists, Options etc. An example:
scala> import scalaz._
import scalaz._
scala> import Scalaz._
import Scalaz._
scala> val map1 = Map(1 -> 3 , 2 -> 4)
map1: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3, 2 -> 4)
scala> val map2 = Map(1 -> 1, 3 -> 6)
map2: scala.collection.immutable.Map[Int,Int] = Map(1 -> 1, 3 -> 6)
scala> map1 |+| map2
res2: scala.collection.immutable.Map[Int,Int] = Map(1 -> 4, 3 -> 6, 2 -> 4)
So in your case, rather then create a List[(String,Int)]
, create a List[Map[String,Int]]
, and then sum them:
val mapped = input.flatMap(_.split(" ").map(word => Map(word -> 1)))
mapped.suml
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With