Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Column.isin in Java?

I'm trying to filter a Spark DataFrame using a list in Java.

java.util.List<Long> selected = ....;
DataFrame result = df.filter(df.col("something").isin(????));

The problem is that isin(...) method accepts Scala Seq or varargs.

Passing in JavaConversions.asScalaBuffer(selected) doesn't work either.

Any ideas?

like image 915
Boris Avatar asked Nov 07 '16 15:11

Boris


People also ask

How do I add a column to a DataFrame in Java?

A new column could be added to an existing Dataset using Dataset. withColumn() method. withColumn accepts two arguments: the column name to be added, and the Column and returns a new Dataset<Row>.

How do I use ISIN function in PySpark?

PySpark isin() or IN operator is used to check/filter if the DataFrame values are exists/contains in the list of values. isin() is a function of Column class which returns a boolean value True if the value of the expression is contained by the evaluated values of the arguments.

Is not in Spark SQL?

In Spark isin() function is used to check if the DataFrame column value exists in a list/array of values. To use IS NOT IN, use the NOT operator to negate the result of the isin() function.

How do you use like in PySpark?

In Spark & PySpark like() function is similar to SQL LIKE operator that is used to match based on wildcard characters (percentage, underscore) to filter the rows. You can use this function to filter the DataFrame rows by single or multiple conditions, to derive a new column, use it on when().


1 Answers

Use stream method as follows:

df.filter(col("something").isin(selected.stream().toArray(String[]::new))))
like image 193
Shankar Avatar answered Sep 23 '22 16:09

Shankar