I'm trying to write a multi-value filter for a Spark SQL DataFrame.
I have:
val df: DataFrame // my data
val field: String // The field of interest
val values: Array[Any] // The allowed possible values
and I'm trying to come up with the filter specification.
At the moment, I have:
val filter = values.map(value => df(field) === value)).reduce(_ || _)
But this isn't robust in the case where I get passed an empty list of values. To cover that case, I would like:
val filter = values.map(value => df(field) === value)).fold(falseColumn)(_ || _)
but I don't know how to specify falseColumn.
Anyone know how to do so?
And is there a better way of writing this filter? (If so, I still need the answer for how to get a falseColumn - I need a trueColumn for a separate piece).
A column that is always true:
val trueColumn = lit(true)
A column that is always false:
val falseColumn = lit(false)
Using lit(...)
means these will always be valid columns, regardless of what columns the DataFrame contains.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With