Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

On Spark's rdd.map(_.swap)

I'm new to both Scala and Spark. Could anyone explain what's the meaning of

rdd.map(_.swap)

? If I look in Scala/Spark API I cannot find swap as a method in RDD class.

like image 728
oikonomiyaki Avatar asked Dec 18 '22 20:12

oikonomiyaki


1 Answers

swap is a method on Scala Tuples. It swaps the first and second elements of a Tuple2 (or pair) with each other. For example:

scala> val pair = ("a","b")
pair: (String, String) = (a,b)

scala> val swapped = pair.swap
swapped: (String, String) = (b,a)

RDD's map function applies a given function to each element of the RDD. In this case, the function to be applied to each element is simply

_.swap

The underscore in this case is shorthand in Scala when writing anonymous functions, and it pertains to the parameter passed in to your function without naming it. So the above snippet can be rewritten into something like:

rdd.map{ pair => pair.swap }

So the code snippet you posted swaps the first and second elements of the tuple/pair in each row of the RDD.

like image 56
Ton Torres Avatar answered Dec 29 '22 16:12

Ton Torres