I want to output empty dataframe to csv file. I use these codes:
df.repartition(1).write.csv(path, sep='\t', header=True)
But due to there is no data in dataframe, spark won't output header to csv file. Then I modify the codes to:
if df.count() == 0:
empty_data = [f.name for f in df.schema.fields]
df = ss.createDataFrame([empty_data], df.schema)
df.repartition(1).write.csv(path, sep='\t')
else:
df.repartition(1).write.csv(path, sep='\t', header=True)
It works, but I want to ask whether there are a better way without count function.
df.count() == 0
will make your driver program retrieve the count of all your dataframe partitions across the executors.
In your case I would use df.take(1).isEmpty
(Spark >= 2.1). Still slow, but preferable to a raw count()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With