val items = List("a", "b", "c") sqlContext.sql("select c1 from table") .filter($"c1".isin(items)) .collect .foreach(println)
The code above throws the following exception.
Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.$colon$colon List(a, b, c) at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:49) at org.apache.spark.sql.functions$.lit(functions.scala:89) at org.apache.spark.sql.Column$$anonfun$isin$1.apply(Column.scala:642) at org.apache.spark.sql.Column$$anonfun$isin$1.apply(Column.scala:642) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.Column.isin(Column.scala:642)
Below is my attempt to fix it. It compiles and runs but doesn't return any match. Not sure why.
val items = List("a", "b", "c").mkString("\"","\",\"","\"") sqlContext.sql("select c1 from table") .filter($"c1".isin(items)) .collect .foreach(println)
In Spark isin() function is used to check if the DataFrame column value exists in a list/array of values. To use IS NOT IN, use the NOT operator to negate the result of the isin() function.
PySpark SQL functions lit() and typedLit() are used to add a new column to DataFrame by assigning a literal or constant value. Both these functions return Column type as return…
As of Spark 2.0, Spark SQL supports subqueries. A subquery (aka subquery expression) is a query that is nested inside of another query. There are the following kinds of subqueries: A subquery as a source (inside a SQL FROM clause)
According to documentation, isin
takes a vararg, not a list. List is actually a confusing name here. You can try converting your List to vararg like this:
val items = List("a", "b", "c") sqlContext.sql("select c1 from table") .filter($"c1".isin(items:_*)) .collect .foreach(println)
Your variant with mkString compiles, because one single String is also a vararg (with number of arguments equal to 1), but it is proably not what you want to achieve.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With