Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need a TRUE and FALSE column in Spark-SQL

I'm trying to write a multi-value filter for a Spark SQL DataFrame.

I have:

val df: DataFrame      // my data
val field: String      // The field of interest
val values: Array[Any] // The allowed possible values

and I'm trying to come up with the filter specification.

At the moment, I have:

val filter = values.map(value => df(field) === value)).reduce(_ || _)

But this isn't robust in the case where I get passed an empty list of values. To cover that case, I would like:

val filter = values.map(value => df(field) === value)).fold(falseColumn)(_ || _)

but I don't know how to specify falseColumn.

Anyone know how to do so?

And is there a better way of writing this filter? (If so, I still need the answer for how to get a falseColumn - I need a trueColumn for a separate piece).

like image 282
Nathan Kronenfeld Avatar asked Feb 14 '17 05:02

Nathan Kronenfeld


1 Answers

A column that is always true:

val trueColumn = lit(true)

A column that is always false:

val falseColumn = lit(false)

Using lit(...) means these will always be valid columns, regardless of what columns the DataFrame contains.

like image 128
socom1880 Avatar answered Nov 12 '22 19:11

socom1880