Is this even possible in spark dataframe (1.6/2.1)
val data="some variable"
df.filter("column1"> data)
I can do this with a static value but cant figure out how to do filter by a variable.
In Spark & PySpark, contains() function is used to match a column value contains in a literal string (matches on part of the string), this is mostly used to filter rows on DataFrame.
You can select the single or multiple columns of the DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with selected columns. show() function is used to show the Dataframe contents.
The Spark where() function is defined to filter rows from the DataFrame or the Dataset based on the given one or multiple conditions or SQL expression. The where() operator can be used instead of the filter when the user has the SQL background. Both the where() and filter() functions operate precisely the same.
import org.apache.spark.sql.functions._
val data="some variable"
df.filter(col("column1") > lit(data))
                        I'm not sure how you accomplished that with a literal either since what you have doesn't match any of the filter method signatures.
So yes, you can work with a non-literal, but try this:
import sparkSession.implicits._
df.filter($"column1" > data)
Note the $, which uses implicit conversion to turn the String into the Column named with that String. Meanwhile, this Column has a > method that takes an Any and returns a new Column. That Any will be your data value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With