Can I convert pandas dataframe to spark rdd?

Question

Pbm:

a) Read a local file into Panda dataframe say PD_DF b) Manipulate/Massge the PD_DF and add columns to dataframe c) Need to write PD_DF to HDFS using spark. How do I do it ?

trianta2 · Accepted Answer

You can use the SQLContext object to invoke the createDataFrame method, which takes an input data which can optionally be a Pandas DataFrame object.

sam · Answer

Lets say dataframe is of type pandas.core.frame.DataFrame then in spark 2.1 - Pyspark I did this

rdd_data = spark.createDataFrame(dataframe)\
                .rdd

In case, if you want to rename any columns or select only few columns, you do them before use of .rdd

Hope it works for you also.

Can I convert pandas dataframe to spark rdd?

Tags:

pyspark

Ram Narayanan

2 Answers

trianta2

sam

Recent Activity

Donate For Us

Can I convert pandas dataframe to spark rdd?

Tags:

pyspark

Ram Narayanan

2 Answers

trianta2

sam

Related questions

Recent Activity

Donate For Us