I would like to add where condition for a column with Multiple values in DataFrame.
Its working for single value, for example.
df.where($"type".==="type1" && $"status"==="completed").
How can i add multiple values for the same column like below.
df.where($"type" IN ("type1","type2") && $"status" IN ("completed","inprogress")
Conclusion. In Spark isin() function is used to check if the DataFrame column value exists in a list/array of values. To use IS NOT IN, use the NOT operator to negate the result of the isin() function.
As of Spark 2.0, Spark SQL supports subqueries. A subquery (aka subquery expression) is a query that is nested inside of another query. There are the following kinds of subqueries: A subquery as a source (inside a SQL FROM clause)
Spark SQL is Apache Spark's module for working with structured data. The SQL Syntax section describes the SQL syntax in detail along with usage examples when applicable. This document provides a list of Data Definition and Data Manipulation Statements, as well as Data Retrieval and Auxiliary Statements.
By using regexp_replace() Spark function you can replace a column's string value with another string/substring. regexp_replace() uses Java regex for matching, if the regex does not match it returns an empty string. The below example replaces the street name Rd value with Road string on address column.
the method you are looking for is isin
:
import sqlContext.implicits._
df.where($"type".isin("type1","type2") and $"status".isin("completed","inprogress"))
Typically, you want to do something like this
val types = Seq("type1","type2")
val statuses = Seq("completed","inprogress")
df.where($"type".isin(types:_*) and $"status".isin(statuses:_*))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With