I have constructed a Spark dataframe from a query. What I wish to do is print the dataframe to a text file with all information delimited by '|', like the following:
+-------+----+----+----+
|Summary|col1|col2|col3|
+-------+----+----+----+
|row1 |1 |14 |17 |
|row2 |3 |12 |2343|
+-------+----+----+----+
How can I do this?
In Spark, you can save (write/extract) a DataFrame to a CSV file on disk by using dataframeObj. write. csv("path") , using this you can also write DataFrame to AWS S3, Azure Blob, HDFS, or any Spark supported file systems.
Use spark. read. option("delimiter", "\t"). csv(file) or sep instead of delimiter .
We can use . textFile to get first row of csv file and capture the delimiter assign to variable. Save this answer.
You can try to write to csv choosing a delimiter of |
df.write.option("sep","|").option("header","true").csv(filename)
This would not be 100% the same but would be close.
Alternatively you can collect to the driver and do it youself e.g.:
myprint(df.collect())
or
myprint(df.take(100))
df.collect and df.take return a list of rows.
Lastly you can collect to the driver using topandas and use pandas tools
In Spark 2.0+, you can use in-built CSV writer. Here delimiter
is ,
by default and you can set it to |
df.write \
.format('csv') \
.options(delimiter='|') \
.save('target/location')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With