Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Spark- How to output empty DataFrame to csv file (Only output header)?

I want to output empty dataframe to csv file. I use these codes:

df.repartition(1).write.csv(path, sep='\t', header=True)

But due to there is no data in dataframe, spark won't output header to csv file. Then I modify the codes to:

if df.count() == 0:
    empty_data = [f.name for f in df.schema.fields]
    df = ss.createDataFrame([empty_data], df.schema)
    df.repartition(1).write.csv(path, sep='\t')
else:
    df.repartition(1).write.csv(path, sep='\t', header=True)

It works, but I want to ask whether there are a better way without count function.

like image 565
Well Avatar asked Nov 29 '17 03:11

Well


1 Answers

df.count() == 0 will make your driver program retrieve the count of all your dataframe partitions across the executors.

In your case I would use df.take(1).isEmpty (Spark >= 2.1). Still slow, but preferable to a raw count().

like image 187
TMichel Avatar answered Oct 26 '22 17:10

TMichel