I saw a solution here but when I tried it doesn't work for me. First I import a cars.csv file : <pre class="prettyprint"><code>val df = sqlContext.read .format("com.databricks.spark.csv") .option("header", "true") .load("/usr/local/spark/cars.csv") </code></pre> Which looks like the following : <pre class="prettyprint"><code>+----+-----+-----+--------------------+-----+ |year| make|model| comment|blank| +----+-----+-----+--------------------+-----+ |2012|Tesla| S| No comment| | |1997| Ford| E350|Go get one now th...| | |2015|Chevy| Volt| null| null| </code></pre> Then I do this : <pre class="prettyprint"><code>df.na.fill("e",Seq("blank")) </code></pre> But the null values didn't change. Can anyone help me ?

This is basically very simple. You'll need to create a new <code>DataFrame</code>. I'm using the <code>DataFrame df</code> that you have defined earlier. <pre class="prettyprint"><code>val newDf = df.na.fill("e",Seq("blank")) </code></pre> <code>DataFrame</code>s are immutable structures. Each time you perform a transformation which you need to store, you'll need to affect the transformed <code>DataFrame</code> to a new value.

you can achieve same in java this way <pre class="prettyprint"><code>Dataset<Row> filteredData = dataset.na().fill(0); </code></pre>

Replace null values in Spark DataFrame

Tags:

dataframe

scala

apache-spark

I saw a solution here but when I tried it doesn't work for me.

First I import a cars.csv file :

val df = sqlContext.read
              .format("com.databricks.spark.csv")
              .option("header", "true")
              .load("/usr/local/spark/cars.csv")

Which looks like the following :

+----+-----+-----+--------------------+-----+
|year| make|model|             comment|blank|
+----+-----+-----+--------------------+-----+
|2012|Tesla|    S|          No comment|     |
|1997| Ford| E350|Go get one now th...|     |
|2015|Chevy| Volt|                null| null|

Then I do this :

df.na.fill("e",Seq("blank"))

But the null values didn't change.

Can anyone help me ?

741

asked Oct 27 '15 19:10

Gavin Niu

2 Answers

This is basically very simple. You'll need to create a new DataFrame. I'm using the DataFrame df that you have defined earlier.

val newDf = df.na.fill("e",Seq("blank"))

DataFrames are immutable structures. Each time you perform a transformation which you need to store, you'll need to affect the transformed DataFrame to a new value.

119

answered Sep 24 '22 01:09

eliasah

you can achieve same in java this way

Dataset<Row> filteredData = dataset.na().fill(0);

answered Sep 26 '22 01:09

Bhagwati Malav

Related questions
                            
                                How can a parameter's default value reference another parameter?
                            
                                How to COUNT(*) in Slick 2.0?
                            
                                Turn Slick logging off
                            
                                Comparing Haskell and Scala Bind/Flatmap Examples
                            
                                How do I convert a WrappedArray column in spark dataframe to Strings?
                            
                                Correct use of Akka http client connection pools
                            
                                How to avoid double logging with logback? [duplicate]
                            
                                Using scala actor framework as fork-join computation?
                            
                                Semantics of abstract traits in Scala
                            
                                serializing objects to json with play.api.libs.json
                            
                                Pattern matching on a list in Scala
                            
                                Make an arbitrary class in Scala as a monad instance
                            
                                Scala - delete file if exist, the Scala way
                            
                                Scala add new column to dataframe by expression
                            
                                Efficacy of sticking to just the functional paradigm in Scala
                            
                                Spark, Scala, DataFrame: create feature vectors
                            
                                SBT Test Error: java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream
                            
                                Scala's tuple unwrapping nuance
                            
                                What does the tilde (~) mean in this Scala example?
                            
                                scalacheck case class random data generator

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With