Attempting to remove rows in which a Spark dataframe column contains blank strings. Originally did <code>val df2 = df1.na.drop()</code> but it turns out many of these values are being encoded as <code>""</code>. I'm stuck using Spark 1.3.1 and also cannot rely on DSL. (Importing spark.implicit_ isn't working.)

Removing things from a dataframe requires <code>filter()</code>. <pre class="prettyprint"><code>newDF = oldDF.filter("colName != ''") </code></pre> or am I misunderstanding your question?

In case someone dont want to drop the records with blank strings, but just convvert the blank strings to some constant value. <pre class="prettyprint"><code>val newdf = df.na.replace(df.columns,Map("" -> "0")) // to convert blank strings to zero newdf.show() </code></pre>

Removing Blank Strings from a Spark Dataframe

2 Answers

Removing things from a dataframe requires filter().

newDF = oldDF.filter("colName != ''")

or am I misunderstanding your question?

168

answered Oct 08 '22 11:10

Kristian

In case someone dont want to drop the records with blank strings, but just convvert the blank strings to some constant value.

val newdf = df.na.replace(df.columns,Map("" -> "0")) // to convert blank strings to zero
newdf.show()

answered Oct 08 '22 10:10

Gaurav Khare

Related questions
                            
                                How can I be modifying a Nil list?
                            
                                Scala: custom control structures with several code blocks
                            
                                Scala - How to convert from List of tuples of type (A,B) to type (B,A) using map
                            
                                Why isn't Scala used much for Desktop Applications? [closed]
                            
                                Scala for comprehension performance
                            
                                Play 2 reverse routing, get route from controller method
                            
                                Play Framework: What happens when requests exceeds the available threads
                            
                                Enums in Scala with multiple constructor parameters
                            
                                Are there any means in Scala to split a class code into many files?
                            
                                How to define a tag with Play 2.0?
                            
                                Forms in Scala play framework
                            
                                Error running scala console. Module not found
                            
                                spark-shell with colored repl
                            
                                Early return from a Scala constructor
                            
                                Passing functions for all applicable types around
                            
                                How do I add an XML tag or not, depending on an Option in Scala?
                            
                                Why does Scala's require method in Predef allow a String as argument?
                            
                                implicit parameter VS default parameter value
                            
                                GroupBy in scala
                            
                                Ignore case for a string in scala

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Removing Blank Strings from a Spark Dataframe

Tags:

scala

sbt

apache-spark

mongolol

People also ask

2 Answers

Kristian

Gaurav Khare

Recent Activity

Donate For Us