Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to append to a csv file using df.write.csv in pyspark?

I'm trying to append data to my csv file using df.write.csv. This is what I did after following spark document http://spark.apache.org/docs/2.0.1/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter:

from pyspark.sql import DataFrameWriter
.....
df1 = sqlContext.createDataFrame(query1)
df1.write.csv("/opt/Output/sqlcsvA.csv", append) #also tried 'mode=append'

Executing the above code gives me error:

NameError: name 'append' not defined

Without append, error:

The path already exists.

like image 483
kavya Avatar asked Dec 19 '16 07:12

kavya


People also ask

How do I write in CSV in PySpark?

In PySpark you can save (write/extract) a DataFrame to a CSV file on disk by using dataframeObj. write. csv("path") , using this you can also write DataFrame to AWS S3, Azure Blob, HDFS, or any PySpark supported file systems.

How do I append in PySpark DataFrame?

Here we create an empty DataFrame where data is to be added, then we convert the data to be added into a Spark DataFrame using createDataFrame() and further convert both DataFrames to a Pandas DataFrame using toPandas() and use the append() function to add the non-empty data frame to the empty DataFrame and ignore the ...


1 Answers

df.write.save(path='csv', format='csv', mode='append', sep='\t')
like image 98
Zhang Tong Avatar answered Sep 25 '22 22:09

Zhang Tong