Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between filter and where in scala spark sql

Tags:

I've tried both but it works same

example

val items =  List(1, 2, 3) 

using filter

employees.filter($"emp_id".isin(items:_*)).show 

using where

employees.where($"emp_id".isin(items:_*)).show 

Result is same for the both

+------+------+------+-------+------+-------+ |EMP_ID|F_NAME|SALARY|DEPT_ID|L_NAME|MANAGER| +------+------+------+-------+------+-------+ |     6|    E6|  2000|      4|    L6|      2| |     7|    E7|  3000|      4|    L7|      1| |     8|    E8|  4000|      2|    L8|      2| |     9|    E9|  1500|      2|    L9|      1| |    10|   E10|  1000|      2|   L10|      1| |     4|    E4|   400|      3|    L4|      1| |     2|    E2|   200|      1|    L2|      1| |     3|    E3|   700|      2|    L3|      2| |     5|    E5|   300|      2|    L5|      2| +------+------+------+-------+------+-------+ 
like image 582
Ishan Avatar asked Nov 24 '15 05:11

Ishan


People also ask

What is the function of filter () in Spark?

In Spark, the Filter function returns a new dataset formed by selecting those elements of the source on which the function returns true. So, it retrieves only the elements that satisfy the given condition.

Where is the Spark filter?

Spark filter() or where() function is used to filter the rows from DataFrame or Dataset based on the given one or multiple conditions or SQL expression. You can use where() operator instead of the filter if you are coming from SQL background. Both these functions operate exactly the same.

What is the difference between == and === in Scala?

An important difference for Spark is the return value. For Column: == returns a boolean. === returns a column (which contains the result of the comparisons of the elements of two columns)

What is the difference between select and selectExpr in Spark?

Therefore, select() method is useful when you simply need to select a subset of columns from a particular Spark DataFrame. On the other hand, selectExpr() comes in handy when you need to select particular columns while at the same time you also need to apply some sort of transformation over particular column(s).


1 Answers

where documentation:

Filters rows using the given condition. This is an alias for filter.

filter is simply the standard Scala (and FP in general) name for such a function, and where is for people who prefer SQL.

like image 57
Alexey Romanov Avatar answered Sep 20 '22 23:09

Alexey Romanov