I've tried both but it works same
example
val items = List(1, 2, 3)
using filter
employees.filter($"emp_id".isin(items:_*)).show
using where
employees.where($"emp_id".isin(items:_*)).show
Result is same for the both
+------+------+------+-------+------+-------+ |EMP_ID|F_NAME|SALARY|DEPT_ID|L_NAME|MANAGER| +------+------+------+-------+------+-------+ | 6| E6| 2000| 4| L6| 2| | 7| E7| 3000| 4| L7| 1| | 8| E8| 4000| 2| L8| 2| | 9| E9| 1500| 2| L9| 1| | 10| E10| 1000| 2| L10| 1| | 4| E4| 400| 3| L4| 1| | 2| E2| 200| 1| L2| 1| | 3| E3| 700| 2| L3| 2| | 5| E5| 300| 2| L5| 2| +------+------+------+-------+------+-------+
In Spark, the Filter function returns a new dataset formed by selecting those elements of the source on which the function returns true. So, it retrieves only the elements that satisfy the given condition.
Spark filter() or where() function is used to filter the rows from DataFrame or Dataset based on the given one or multiple conditions or SQL expression. You can use where() operator instead of the filter if you are coming from SQL background. Both these functions operate exactly the same.
An important difference for Spark is the return value. For Column: == returns a boolean. === returns a column (which contains the result of the comparisons of the elements of two columns)
Therefore, select() method is useful when you simply need to select a subset of columns from a particular Spark DataFrame. On the other hand, selectExpr() comes in handy when you need to select particular columns while at the same time you also need to apply some sort of transformation over particular column(s).
where
documentation:
Filters rows using the given condition. This is an alias for filter.
filter
is simply the standard Scala (and FP in general) name for such a function, and where
is for people who prefer SQL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With