Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to export a table dataframe in PySpark to csv?

I am using Spark 1.3.1 (PySpark) and I have generated a table using a SQL query. I now have an object that is a DataFrame. I want to export this DataFrame object (I have called it "table") to a csv file so I can manipulate it and plot the columns. How do I export the DataFrame "table" to a csv file?

Thanks!

like image 888
PyRsquared Avatar asked Jul 13 '15 13:07

PyRsquared


People also ask

How do I export PySpark DataFrame to CSV?

In PySpark you can save (write/extract) a DataFrame to a CSV file on disk by using dataframeObj. write. csv("path") , using this you can also write DataFrame to AWS S3, Azure Blob, HDFS, or any PySpark supported file systems.

How do I export a DataFrame to a CSV file?

Exporting the DataFrame into a CSV filePandas DataFrame to_csv() function exports the DataFrame to CSV format. If a file argument is provided, the output will be the CSV file. Otherwise, the return value is a CSV format like string. sep: Specify a custom delimiter for the CSV output, the default is a comma.

How do I export a table from Databricks?

From Azure Databricks home, you can go to “Upload Data” (under Common Tasks)→ “DBFS” → “FileStore”. DBFS FileStore is where you create folders and save your data frames into CSV format.


1 Answers

If data frame fits in a driver memory and you want to save to local files system you can convert Spark DataFrame to local Pandas DataFrame using toPandas method and then simply use to_csv:

df.toPandas().to_csv('mycsv.csv') 

Otherwise you can use spark-csv:

  • Spark 1.3

    df.save('mycsv.csv', 'com.databricks.spark.csv') 
  • Spark 1.4+

    df.write.format('com.databricks.spark.csv').save('mycsv.csv') 

In Spark 2.0+ you can use csv data source directly:

df.write.csv('mycsv.csv') 
like image 65
zero323 Avatar answered Oct 02 '22 10:10

zero323