Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert an RDD of Maps to dataframe

I have RDD of Map and i want to converted it to dataframe Here is the input format of RDD

val mapRDD: RDD[Map[String, String]] = sc.parallelize(Seq(
   Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
   Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"),
   Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"),
   Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"),
   Map("empid" -> "16", "empName" -> "John", "depId" -> "701")))

is there any way to convert into dataframe like

 val df=mapRDD.toDf

df.show

empid,  empName,    depId
12      Rohan       201
13      Ross        201
14      Richard     401
15      Michale     501
16      John        701
like image 796
ta. Avatar asked Nov 24 '16 08:11

ta.


People also ask

How you will convert RDD into data frame and Datasets?

Convert Using createDataFrame Method This method can take an RDD and create a DataFrame from it. The createDataFrame is an overloaded method, and we can call the method by passing the RDD alone or with a schema. We can observe the column names are following a default sequence of names based on a default template.

Can we convert RDD to DataFrame in Pyspark?

Method 1: Using createDataframe() function. After creating the RDD we have converted it to Dataframe using createDataframe() function in which we have passed the RDD and defined schema for Dataframe.

How will you create a DataFrame from RDD with schema?

Creating DataFrame with schema To use createDataFrame() to create a DataFrame with schema we need to create a Schema first and then convert RDD to RDD of type Row. Pass Row[RDD] and schema to createDataFrame to create DataFrame.


1 Answers

You can easily convert it into Spark DataFrame:

Here is a code that would do the trick :

val mapRDD= sc.parallelize(Seq(
   Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
   Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"),
   Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"),
   Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"),
   Map("empid" -> "16", "empName" -> "John", "depId" -> "701")))

val columns=mapRDD.take(1).flatMap(a=>a.keys)

val resultantDF=mapRDD.map{value=>
      val list=value.values.toList
      (list(0),list(1),list(2))
      }.toDF(columns:_*)

resultantDF.show()

The output is :

+-----+-------+-----+
|empid|empName|depId|
+-----+-------+-----+
|   12|  Rohan|  201|
|   13|   Ross|  201|
|   14|Richard|  401|
|   15|Michale|  501|
|   16|   John|  701|
+-----+-------+-----+
like image 84
Shivansh Avatar answered Oct 08 '22 03:10

Shivansh