Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert RDD[Row] to RDD[String]

I have a DataFrame called source, a table from mysql

val source = sqlContext.read.jdbc(jdbcUrl, "source", connectionProperties)

I have converted it to rdd by

val sourceRdd = source.rdd

but its RDD[Row] I need RDD[String] to do transformations like

source.map(rec => (rec.split(",")(0).toInt, rec)), .subtractByKey(), etc..

Thank you

like image 200
Vickyster Avatar asked May 19 '17 10:05

Vickyster


2 Answers

You can use Row. mkString(sep: String): String method in a map call like this :

val sourceRdd = source.rdd.map(_.mkString(","))

You can change the "," parameter by whatever you want.

Hope this help you, Best Regards.

like image 80
Haroun Mohammedi Avatar answered Oct 10 '22 04:10

Haroun Mohammedi


What is your schema?

If it's just a String, you can use:

import spark.implicits._
val sourceDS = source.as[String]
val sourceRdd = sourceDS.rdd // will give RDD[String]

Note: use sqlContext instead of spark in Spark 1.6 - spark is a SparkSession, which is a new class in Spark 2.0 and is a new entry point to SQL functionality. It should be used instead of SQLContext in Spark 2.x

You can also create own case classes.

Also you can map rows - here source is of type DataFrame, we use partial function in map function:

val sourceRdd = source.rdd.map { case x : Row => x(0).asInstanceOf[String] }.map(s => s.split(","))
like image 2
T. Gawęda Avatar answered Oct 10 '22 05:10

T. Gawęda