Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Specifying the filename when saving a DataFrame as a CSV [duplicate]

Say I have a Spark DF that I want to save to disk a CSV file. In Spark 2.0.0+, one can convert DataFrame(DataSet[Rows]) as a DataFrameWriter and use the .csv method to write the file.

The function is defined as

def csv(path: String): Unit     path : the location/folder name and not the file name. 

Spark stores the csv file at the location specified by creating CSV files with name - part-*.csv.

Is there a way to save the CSV with specified filename instead of part-*.csv ? Or possible to specify prefix to instead of part-r ?

Code :

df.coalesce(1).write.csv("sample_path") 

Current Output :

sample_path | +-- part-r-00000.csv 

Desired Output :

sample_path | +-- my_file.csv 

Note : The coalesce function is used to output a single file and the executor has enough memory to collect the DF without memory error.

like image 892
Spandan Brahmbhatt Avatar asked Feb 01 '17 21:02

Spandan Brahmbhatt


People also ask

How do I rename a file in PySpark?

Use fs. rename() by passing source and destination paths to rename a file.


1 Answers

It's not possible to do it directly in Spark's save

Spark uses Hadoop File Format, which requires data to be partitioned - that's why you have part- files. You can easily change filename after processing just like in this question

In Scala it will look like:

import org.apache.hadoop.fs._ val fs = FileSystem.get(sc.hadoopConfiguration) val file = fs.globStatus(new Path("path/file.csv/part*"))(0).getPath().getName()  fs.rename(new Path("csvDirectory/" + file), new Path("mydata.csv")) fs.delete(new Path("mydata.csv-temp"), true) 

or just:

import org.apache.hadoop.fs._ val fs = FileSystem.get(sc.hadoopConfiguration) fs.rename(new Path("csvDirectory/data.csv/part-0000"), new Path("csvDirectory/newData.csv")) 

Edit: As mentioned in comments, you can also write your own OutputFormat, please see documents for information about this approach to set file name

like image 130
T. Gawęda Avatar answered Sep 20 '22 17:09

T. Gawęda