I saw a solution here but when I tried it doesn't work for me.
First I import a cars.csv file :
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true")
.load("/usr/local/spark/cars.csv")
Which looks like the following :
+----+-----+-----+--------------------+-----+
|year| make|model| comment|blank|
+----+-----+-----+--------------------+-----+
|2012|Tesla| S| No comment| |
|1997| Ford| E350|Go get one now th...| |
|2015|Chevy| Volt| null| null|
Then I do this :
df.na.fill("e",Seq("blank"))
But the null values didn't change.
Can anyone help me ?
The replacement of null values in PySpark DataFrames is one of the most common operations undertaken. This can be achieved by using either DataFrame. fillna() or DataFrameNaFunctions. fill() methods.
In PySpark, DataFrame. fillna() or DataFrameNaFunctions. fill() is used to replace NULL/None values on all or selected multiple DataFrame columns with either zero(0), empty string, space, or any constant literal values.
You can keep null values out of certain columns by setting nullable to false . You won't be able to set nullable to false for all columns in a DataFrame and pretend like null values don't exist. For example, when joining DataFrames, the join column will return null when a match cannot be made.
This is basically very simple. You'll need to create a new DataFrame
. I'm using the DataFrame df
that you have defined earlier.
val newDf = df.na.fill("e",Seq("blank"))
DataFrame
s are immutable structures.
Each time you perform a transformation which you need to store, you'll need to affect the transformed DataFrame
to a new value.
you can achieve same in java this way
Dataset<Row> filteredData = dataset.na().fill(0);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With