Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error: type mismatch flatMap

I am new to spark programming and scala and i am not able to understand the difference between map and flatMap. I tried below code as i was expecting both to work but got error.

scala> val b = List("1","2", "4", "5")
b: List[String] = List(1, 2, 4, 5)

scala> b.map(x => (x,1))
res2: List[(String, Int)] = List((1,1), (2,1), (4,1), (5,1))

scala> b.flatMap(x => (x,1))
<console>:28: error: type mismatch;
 found   : (String, Int)
 required: scala.collection.GenTraversableOnce[?]
              b.flatMap(x => (x,1))

As per my understanding flatmap make Rdd in to collection for String/Int Rdd. I was thinking that in this case both should work without any error.Please let me know where i am making the mistake.

Thanks

like image 851
Anaadih.pradeep Avatar asked Dec 18 '22 12:12

Anaadih.pradeep


2 Answers

You need to look at how the signatures defined these methods:

def map[U: ClassTag](f: T => U): RDD[U]

map takes a function from type T to type U and returns an RDD[U].

On the other hand, flatMap:

def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U]

Expects a function taking type T to a TraversableOnce[U], which is a trait Tuple2 doesn't implement, and returns an RDD[U]. Generally, you use flatMap when you want to flatten a collection of collections, i.e. if you had an RDD[List[List[Int]] and you want to produce a RDD[List[Int]] you can flatMap it using identity.

like image 141
Yuval Itzchakov Avatar answered Dec 27 '22 22:12

Yuval Itzchakov


map(func) Return a new distributed dataset formed by passing each element of the source through a function func.

flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).

The following example might be helpful.

        scala> val b = List("1", "2", "4", "5")
        b: List[String] = List(1, 2, 4, 5)

        scala> b.map(x=>Set(x,1))
        res69: List[scala.collection.immutable.Set[Any]] =     
        List(Set(1, 1), Set(2, 1), Set(4, 1), Set(5, 1))

        scala> b.flatMap(x=>Set(x,1))
        res70: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)

        scala> b.flatMap(x=>List(x,1))
        res71: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)

        scala> b.flatMap(x=>List(x+1))
        res75: scala.collection.immutable.Set[String] = List(11, 21, 41, 51) // concat




        scala> val x = sc.parallelize(List("aa bb cc dd",  "ee ff gg hh"), 2)

        scala> val y = x.map(x => x.split(" ")) // split(" ") returns an array of words
        scala> y.collect
        res0: Array[Array[String]] = Array(Array(aa, bb, cc, dd), Array(ee, ff, gg, hh))

        scala> val y = x.flatMap(x => x.split(" "))
        scala> y.collect
        res1: Array[String] = Array(aa, bb, cc, dd, ee, ff, gg, hh)
like image 22
Kris Avatar answered Dec 27 '22 23:12

Kris