How to write a dataframe in pyspark having null values to CSV

Question

I'm using the below code to write to a CSV file.

df.coalesce(1).write.format("com.databricks.spark.csv").option("header", "true").option("nullValue"," ").save("/home/user/test_table/")

when I execute it, I'm getting the following error:

java.lang.UnsupportedOperationException: CSV data source does not support null data type.

Could anyone please help?

Carlos Villacreces · Accepted Answer

I had the same problem (not using that command with the nullValue option) and I solved it by using the fillna method.

And I also realised that fillna was not working with _corrupt_record, so I dropped since I didn't need it.

df = df.drop('_corrupt_record')
df = df.fillna("")
df.write.option('header', 'true').format('csv').save('file_csv')

Donate For Us