Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache-Spark : What is map(_._2) shorthand for?

I read a project's source code, found:

val sampleMBR = inputMBR.map(_._2).sample

inputMBR is a tuple.

the function map's definition is :

map[U classTag](f:T=>U):RDD[U]

it seems that map(_._2) is the shorthand for map(x => (x._2)).

Anyone can tell me rules of those shorthand ?

like image 318
chenzhongpu Avatar asked Mar 25 '15 02:03

chenzhongpu


People also ask

What does _ _2 mean in Scala?

In Scala _2 is shorthand for accessing second tuple element.

What is map in Apache Spark?

Map : A map is a transformation operation in Apache Spark. It applies to each element of RDD and it returns the result as new RDD. In the Map, operation developer can define his own custom business logic. The same logic will be applied to all the elements of RDD.

What is map in spark Scala?

Spark map() is a transformation operation that is used to apply the transformation on every element of RDD, DataFrame, and Dataset and finally returns a new RDD/Dataset respectively. In this article, you will learn the syntax and usage of the map() transformation with an RDD & DataFrame example.

What is the difference between map and flatMap in spark?

Spark map function expresses a one-to-one transformation. It transforms each element of a collection into one element of the resulting collection. While Spark flatMap function expresses a one-to-many transformation. It transforms each element to 0 or more elements.


2 Answers

The _ syntax can be a bit confusing. When _ is used on its own it represents an argument in the anonymous function. So if we working on pairs: map(_._2 + _._2) would be shorthand for map(x, y => x._2 + y._2). When _ is used as part of a function name (or value name) it has no special meaning. In this case x._2 returns the second element of a tuple (assuming x is a tuple).

like image 99
Holden Avatar answered Sep 23 '22 18:09

Holden


collection.map(_._2) emits a second component of the tuple. Example from pure Scala (Spark RDDs work the same way):

scala> val zipped = (1 to 10).zip('a' to 'j')
zipped: scala.collection.immutable.IndexedSeq[(Int, Char)] = Vector((1,a), (2,b), (3,c), (4,d), (5,e), (6,f), (7,g), (8,h), (9,i), (10,j))

scala> val justLetters = zipped.map(_._2)
justLetters: scala.collection.immutable.IndexedSeq[Char] = Vector(a, b, c, d, e, f, g, h, i, j)
like image 20
marekinfo Avatar answered Sep 23 '22 18:09

marekinfo