Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Scala: retrieve the schema and store it

Is it possible to retrieve the schema of an RDD and store it in a variable? Because I want to create a new data frame from another RDD using the same schema. For example, below is what I am hoping to have:

val schema = oldDF.getSchema()
val newDF = sqlContext.createDataFrame(rowRDD, schema)

Assuming I already have rowRDD in the format of RDD[org.apache.spark.sql.Row] , is this something possible?

like image 327
Edamame Avatar asked May 23 '16 21:05

Edamame


1 Answers

Just use schema attribute

val oldDF = sqlContext.createDataFrame(sc.parallelize(Seq(("a", 1))))
val rowRDD = sc.parallelize(Seq(Row("b", 2))

sqlContext.createDataFrame(rowRDD, oldDF.schema)
like image 191
5ba86145 Avatar answered Sep 30 '22 01:09

5ba86145