I am writing a dataset to json using:
ds.coalesce(1).write.format("json").option("nullValue",null).save("project/src/test/resources")
For records that have columns with null values, the json document does not write that key at all.
Is there a way to enforce null value keys to the json output?
This is needed since I use this json to read it onto another dataset (in a test case) and cannot enforce a schema if some documents do not have all the keys in the case class (I am reading it by putting the json file under resources folder and transforming to a dataset via RDD[String], as explained here: https://databaseline.bitbucket.io/a-quickie-on-reading-json-resource-files-in-apache-spark/)
I agree with @philantrovert.
ds.na.fill("")
.coalesce(1)
.write
.format("json")
.save("project/src/test/resources")
Since DataSets
are immutable you are not altering the data in ds
and you can process it (complete with null values and all) in any following code. You are simply replacing null values with an empty string in the saved file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With