I'm trying to filter a Spark DataFrame using a list in Java.
java.util.List<Long> selected = ....;
DataFrame result = df.filter(df.col("something").isin(????));
The problem is that isin(...)
method accepts Scala Seq
or varargs.
Passing in JavaConversions.asScalaBuffer(selected)
doesn't work either.
Any ideas?
A new column could be added to an existing Dataset using Dataset. withColumn() method. withColumn accepts two arguments: the column name to be added, and the Column and returns a new Dataset<Row>.
PySpark isin() or IN operator is used to check/filter if the DataFrame values are exists/contains in the list of values. isin() is a function of Column class which returns a boolean value True if the value of the expression is contained by the evaluated values of the arguments.
In Spark isin() function is used to check if the DataFrame column value exists in a list/array of values. To use IS NOT IN, use the NOT operator to negate the result of the isin() function.
In Spark & PySpark like() function is similar to SQL LIKE operator that is used to match based on wildcard characters (percentage, underscore) to filter the rows. You can use this function to filter the DataFrame rows by single or multiple conditions, to derive a new column, use it on when().
Use stream
method as follows:
df.filter(col("something").isin(selected.stream().toArray(String[]::new))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With