CSVFileFormat
seems to read and write empty values as null for string columns. I have searched around but have been unable to find clear information about this, so I put together a simple test.
val df = session.createDataFrame(Seq(
(0, "a"),
(1, "b"),
(2, "c"),
(3, ""),
(4, null)
))
df.coalesce(1).write.mode("overwrite").format("csv")
.option("delimiter", ",")
.option("nullValue", "unknown")
.option("treatEmptyValuesAsNulls", "false")
.save(s"$path/test")
This outputs:
0,a
1,b
2,c
3,unknown
4,unknown
So, it appears to be treating both empty strings and null
values as null
. The same thing happens when reading a CSV file with empty quoted strings and nulls. Is there currently any way to treat these differently?
So it is better to use empty string for database as allowing NULL value forces the system to do extra work and does not give the data that you are looking for. However, NULL does not throw any exception when the count() function is executed.
The Java programming language distinguishes between null and empty strings. An empty string is a string instance of zero length, whereas a null string has no value at all. An empty string is represented as "" . It is a character sequence of zero characters.
In Spark 2.0. 0, blank values were read into DataFrames as empty strings and empty strings were read into DataFrames as null .
You can keep null values out of certain columns by setting nullable to false . You won't be able to set nullable to false for all columns in a DataFrame and pretend like null values don't exist. For example, when joining DataFrames, the join column will return null when a match cannot be made.
A mere two and a half years later, empty strings are no longer considered equal to null
values thanks to Spark 2.4.0! See this commit for a bit of detail on functionality. Your code will behave as expected under 2.4.0+:
val df = session.createDataFrame(Seq(
(0, "a"),
(1, "b"),
(2, "c"),
(3, ""),
(4, null)
))
df.coalesce(1).write.mode("overwrite").format("csv")
.option("delimiter", ",")
.option("nullValue", "unknown")
.option("treatEmptyValuesAsNulls", "false")
.save(s"$path/test")
Results in:
0,a
1,b
2,c
3,
4,unknown
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With