Python Spark- How to output empty DataFrame to csv file (Only output header)?

Question

I want to output empty dataframe to csv file. I use these codes:

df.repartition(1).write.csv(path, sep='	', header=True)

But due to there is no data in dataframe, spark won't output header to csv file. Then I modify the codes to:

if df.count() == 0:
    empty_data = [f.name for f in df.schema.fields]
    df = ss.createDataFrame([empty_data], df.schema)
    df.repartition(1).write.csv(path, sep='	')
else:
    df.repartition(1).write.csv(path, sep='	', header=True)

It works, but I want to ask whether there are a better way without count function.

TMichel · Accepted Answer

df.count() == 0 will make your driver program retrieve the count of all your dataframe partitions across the executors.

In your case I would use df.take(1).isEmpty (Spark >= 2.1). Still slow, but preferable to a raw count().

Python Spark- How to output empty DataFrame to csv file (Only output header)?

Tags:

csv

apache-spark

pyspark

spark-dataframe

Well

1 Answers

TMichel

Recent Activity

Donate For Us

Python Spark- How to output empty DataFrame to csv file (Only output header)?

Tags:

csv

apache-spark

pyspark

spark-dataframe

Well

1 Answers

TMichel

Related questions

Recent Activity

Donate For Us