Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DataFrame error: "overloaded method value filter with alternatives"

I am trying to create a new data frame by filter out the rows which is null or empty string using the code below:

val df1 = df.filter(df("fieldA") != "").cache()

Then I got the following error:

 <console>:32: error: overloaded method value filter with alternatives:
      (conditionExpr: String)org.apache.spark.sql.DataFrame <and>
      (condition: org.apache.spark.sql.Column)org.apache.spark.sql.DataFrame
     cannot be applied to (Boolean)
                  val df1 = df.filter(df("fieldA") != "").cache()
                                 ^

Does anyone know what I missed here? Thanks!

like image 802
Edamame Avatar asked May 19 '16 22:05

Edamame


1 Answers

In Scala, in order to compare equality column-wise, you should use === and !== (or =!= in Spark 2.0+):

val df1 = df.filter(df("fieldA") !== "").cache()

Alternatively, you can use an expression:

val df1 = df.filter("fieldA != ''").cache()

Your error happened because the != operator is present in every Scala object and it's used to compare objects, always returning Boolean. However, the filter function expects a Column object or an expression in a String, so there is the !== operator in the Column class, which returns another Column and then can be used in the way you want.

To see all operations available for columns, the Column scaladoc is very useful. Also, there is the functions package.

like image 185
Daniel de Paula Avatar answered Nov 07 '22 00:11

Daniel de Paula