Creating a Spark DataFrame from an RDD of lists

Tags:

I have an rdd (we can call it myrdd) where each record in the rdd is of the form:

[('column 1',value), ('column 2',value), ('column 3',value), ... , ('column 100',value)]

I would like to convert this into a DataFrame in pyspark - what is the easiest way to do this?

884

asked Apr 07 '15 20:04

mgoldwasser

1 Answers

How about use the toDF method? You only need add the field names.

df = rdd.toDF(['column', 'value'])

102

answered Oct 07 '22 18:10

dapangmao

Related questions
                            
                                How to overwrite entire existing column in Spark dataframe with new column?
                            
                                Read whole text files from a compression in Spark
                            
                                Full outer join in pyspark data frames
                            
                                when to use mapParitions and mapPartitionsWithIndex?
                            
                                How to add column with constant in Spark-java data frame
                            
                                How do I get the last item from a list using pyspark?
                            
                                Dynamically rename multiple columns in PySpark DataFrame
                            
                                Converting a dataframe into JSON (in pyspark) and then selecting desired fields
                            
                                SparkException: Values to assemble cannot be null
                            
                                Comparing Cassandra's CQL vs Spark/Shark queries vs Hive/Hadoop (DSE version)
                            
                                Apache Spark: get elements of Row by name
                            
                                How to re-partition pyspark dataframe?
                            
                                How to sum the values of a column in pyspark dataframe
                            
                                How to suppress INFO messages for spark-sql running on EMR?
                            
                                use length function in substring in spark
                            
                                Convert timestamp to date in Spark dataframe
                            
                                How to find max value in pair RDD?
                            
                                create substring column in spark dataframe
                            
                                How to specify schema for CSV file without using Scala case class?
                            
                                Why does foreach not bring anything to the driver program?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Creating a Spark DataFrame from an RDD of lists

Tags:

dataframe

apache-spark

pyspark

mgoldwasser

People also ask

1 Answers

dapangmao

Recent Activity

Donate For Us