I am trying to create a new data frame by filter out the rows which is null or empty string using the code below:
val df1 = df.filter(df("fieldA") != "").cache()
Then I got the following error:
<console>:32: error: overloaded method value filter with alternatives:
(conditionExpr: String)org.apache.spark.sql.DataFrame <and>
(condition: org.apache.spark.sql.Column)org.apache.spark.sql.DataFrame
cannot be applied to (Boolean)
val df1 = df.filter(df("fieldA") != "").cache()
^
Does anyone know what I missed here? Thanks!
In Scala, in order to compare equality column-wise, you should use ===
and !==
(or =!=
in Spark 2.0+):
val df1 = df.filter(df("fieldA") !== "").cache()
Alternatively, you can use an expression:
val df1 = df.filter("fieldA != ''").cache()
Your error happened because the !=
operator is present in every Scala object and it's used to compare objects, always returning Boolean. However, the filter
function expects a Column object or an expression in a String, so there is the !==
operator in the Column
class, which returns another Column and then can be used in the way you want.
To see all operations available for columns, the Column scaladoc is very useful. Also, there is the functions
package.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With