Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write a dataframe in pyspark having null values to CSV

I'm using the below code to write to a CSV file.

df.coalesce(1).write.format("com.databricks.spark.csv").option("header", "true").option("nullValue"," ").save("/home/user/test_table/")

when I execute it, I'm getting the following error:

java.lang.UnsupportedOperationException: CSV data source does not support null data type.

Could anyone please help?

like image 964
Sreejith V Avatar asked Feb 07 '17 13:02

Sreejith V


1 Answers

I had the same problem (not using that command with the nullValue option) and I solved it by using the fillna method.

And I also realised that fillna was not working with _corrupt_record, so I dropped since I didn't need it.

df = df.drop('_corrupt_record')
df = df.fillna("")
df.write.option('header', 'true').format('csv').save('file_csv')
like image 165
Carlos Villacreces Avatar answered Nov 13 '22 11:11

Carlos Villacreces