I'm using Spark 1.4.0, this is what I have so far:
data.filter($"myColumn".in(lit("A"), lit("B"), lit("C"), ...))
The function lit converts a literal to a column.
Ideally I would put my A, B, C in a Set and check like this:
val validValues = Set("A", "B", "C", ...)
data.filter($"myColumn".in(validValues))
What's the correct syntax? Are there any alternative concise solutions?
Spark 1.4 or older:
val validValues = Set("A", "B", "C").map(lit(_))
data.filter($"myColumn".in(validValues.toSeq: _*))
Spark 1.5 or newer:
val validValues = Set("A", "B", "C")
data.filter($"myColumn".isin(validValues.toSeq: _*))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With