Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Column.isin with list?

val items = List("a", "b", "c")  sqlContext.sql("select c1 from table")           .filter($"c1".isin(items))           .collect           .foreach(println) 

The code above throws the following exception.

Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.$colon$colon List(a, b, c)  at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:49) at org.apache.spark.sql.functions$.lit(functions.scala:89) at org.apache.spark.sql.Column$$anonfun$isin$1.apply(Column.scala:642) at org.apache.spark.sql.Column$$anonfun$isin$1.apply(Column.scala:642) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.Column.isin(Column.scala:642) 

Below is my attempt to fix it. It compiles and runs but doesn't return any match. Not sure why.

val items = List("a", "b", "c").mkString("\"","\",\"","\"")  sqlContext.sql("select c1 from table")           .filter($"c1".isin(items))           .collect           .foreach(println) 
like image 487
Nabegh Avatar asked Sep 13 '15 16:09

Nabegh


People also ask

How do I use ISIN in spark?

In Spark isin() function is used to check if the DataFrame column value exists in a list/array of values. To use IS NOT IN, use the NOT operator to negate the result of the isin() function.

What is typedLit?

PySpark SQL functions lit() and typedLit() are used to add a new column to DataFrame by assigning a literal or constant value. Both these functions return Column type as return…

Does spark SQL support subquery?

As of Spark 2.0, Spark SQL supports subqueries. A subquery (aka subquery expression) is a query that is nested inside of another query. There are the following kinds of subqueries: A subquery as a source (inside a SQL FROM clause)


1 Answers

According to documentation, isin takes a vararg, not a list. List is actually a confusing name here. You can try converting your List to vararg like this:

val items = List("a", "b", "c")  sqlContext.sql("select c1 from table")           .filter($"c1".isin(items:_*))           .collect           .foreach(println) 

Your variant with mkString compiles, because one single String is also a vararg (with number of arguments equal to 1), but it is proably not what you want to achieve.

like image 120
TheMP Avatar answered Sep 28 '22 11:09

TheMP