Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

get multiple columns within a map: rdd

I've a DF that I'm explicitly converting into an RDD and trying to fetch each column's record. Not able to fetch each of them within a map. Below is what I've tried:

val df = sql("Select col1, col2, col3, col4, col5 from tableName").rdd

The resultant df becomes the member of org.apache.spark.rdd.RDD[org.apache.spark.sql.Row]

Now I'm trying to access each element of this RDD via:

val dfrdd = df.map{x => x.get(0); x.getAs[String](1); x.get(3)}

The issue is, the above statement returns only the data present on the last transformation of map i.e., the data present on x.get(3). Can someone let me know what I'm doing wrong?

like image 930
knowone Avatar asked Feb 22 '26 18:02

knowone


1 Answers

The last line is always returned from the map, In your case x.get(3) gets returned.

To return multiple values you can return tuples as below

val dfrdd = df.map{x => (x.get(0), x.getAs[String](1), x.get(3))}

Hope this helped!

like image 72
koiralo Avatar answered Feb 25 '26 12:02

koiralo