Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write spark dataframe to file using python and '|' delimiter

I have constructed a Spark dataframe from a query. What I wish to do is print the dataframe to a text file with all information delimited by '|', like the following:

+-------+----+----+----+
|Summary|col1|col2|col3|
+-------+----+----+----+
|row1   |1   |14  |17  |
|row2   |3   |12  |2343|
+-------+----+----+----+

How can I do this?

like image 572
Brian Waters Avatar asked Jan 26 '17 12:01

Brian Waters


People also ask

How do I convert a Spark DataFrame to a CSV file in Python?

In Spark, you can save (write/extract) a DataFrame to a CSV file on disk by using dataframeObj. write. csv("path") , using this you can also write DataFrame to AWS S3, Azure Blob, HDFS, or any Spark supported file systems.

How do I change the delimiter of a CSV file in PySpark?

Use spark. read. option("delimiter", "\t"). csv(file) or sep instead of delimiter .

Where is delimiter in CSV file PySpark?

We can use . textFile to get first row of csv file and capture the delimiter assign to variable. Save this answer.


2 Answers

You can try to write to csv choosing a delimiter of |

df.write.option("sep","|").option("header","true").csv(filename)

This would not be 100% the same but would be close.

Alternatively you can collect to the driver and do it youself e.g.:

myprint(df.collect())

or

myprint(df.take(100))

df.collect and df.take return a list of rows.

Lastly you can collect to the driver using topandas and use pandas tools

like image 76
Assaf Mendelson Avatar answered Oct 23 '22 00:10

Assaf Mendelson


In Spark 2.0+, you can use in-built CSV writer. Here delimiter is , by default and you can set it to |

df.write \
    .format('csv') \
    .options(delimiter='|') \
    .save('target/location')
like image 5
mrsrinivas Avatar answered Oct 22 '22 22:10

mrsrinivas