I'm working on a Spark Application (using Scala) and I have a List which contains multiple values. I'd like to use this list in order to write a where
clause for my DataFrame and select only a subset on tuples. For example, my List contains 'value1', 'value2', and 'value3'. and I would like to write something like this:
mydf.where($"col1" === "value1" || $"col1" === "value2" || $"col1" === "value3)
How can I do that programmatically cause the list contains many values?
You can map a list of values to a list of "filters" (with type Column
), and reduce this list into a single filter by applying the ||
operator on every two filters:
val possibleValues = Seq("value1", "value2", "value3")
val result = mydf.where(possibleValues.map($"col1" === _).reduce(_ || _))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With