I am using Spark 1.3.1 (PySpark) and I have generated a table using a SQL query. I now have an object that is a <code>DataFrame</code>. I want to export this <code>DataFrame</code> object (I have called it "table") to a csv file so I can manipulate it and plot the columns. How do I export the <code>DataFrame</code> "table" to a csv file? Thanks!

If data frame fits in a driver memory and you want to save to local files system you can convert Spark DataFrame to local Pandas DataFrame using <code>toPandas</code> method and then simply use <code>to_csv</code>: <pre class="prettyprint"><code>df.toPandas().to_csv('mycsv.csv') </code></pre> Otherwise you can use spark-csv: <ul> <li> Spark 1.3 <pre class="prettyprint"><code>df.save('mycsv.csv', 'com.databricks.spark.csv') </code></pre> </li> <li> Spark 1.4+ <pre class="prettyprint"><code>df.write.format('com.databricks.spark.csv').save('mycsv.csv') </code></pre> </li> </ul> In Spark 2.0+ you can use <code>csv</code> data source directly: <pre class="prettyprint"><code>df.write.csv('mycsv.csv') </code></pre>

How to export a table dataframe in PySpark to csv?

Tags:

python

dataframe

export-to-csv

apache-spark

apache-spark-sql

I am using Spark 1.3.1 (PySpark) and I have generated a table using a SQL query. I now have an object that is a DataFrame. I want to export this DataFrame object (I have called it "table") to a csv file so I can manipulate it and plot the columns. How do I export the DataFrame "table" to a csv file?

Thanks!

888

asked Jul 13 '15 13:07

PyRsquared

1 Answers

If data frame fits in a driver memory and you want to save to local files system you can convert Spark DataFrame to local Pandas DataFrame using toPandas method and then simply use to_csv:

df.toPandas().to_csv('mycsv.csv')

Otherwise you can use spark-csv:

Spark 1.3

df.save('mycsv.csv', 'com.databricks.spark.csv')

Spark 1.4+

df.write.format('com.databricks.spark.csv').save('mycsv.csv')

In Spark 2.0+ you can use csv data source directly:

df.write.csv('mycsv.csv')

answered Oct 02 '22 10:10

zero323

Related questions
                            
                                How to allow users to change their own passwords in Django?
                            
                                Python Ternary Operator Without else
                            
                                Is there a way to convert number words to Integers?
                            
                                Process list on Linux via Python
                            
                                selenium - chromedriver executable needs to be in PATH [duplicate]
                            
                                Django: Adding "NULLS LAST" to query
                            
                                Convert list into a dictionary [duplicate]
                            
                                DRF: Simple foreign key assignment with nested serializers?
                            
                                Recursively iterate through all subdirectories using pathlib
                            
                                Python equivalent of a given wget command
                            
                                JWT: 'module' object has no attribute 'encode'
                            
                                Python : How to parse the Body from a raw email , given that raw email does not have a "Body" tag or anything
                            
                                hasattr() vs try-except block to deal with non-existent attributes
                            
                                How to run Pip commands from CMD
                            
                                ImportError: cannot import name NUMPY_MKL
                            
                                Python round up integer to next hundred
                            
                                sort dict by value python [duplicate]
                            
                                Pipe character in Python
                            
                                Django check if a related object exists error: RelatedObjectDoesNotExist
                            
                                multiple axis in matplotlib with different scales [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With