Is this even possible in spark dataframe (1.6/2.1)
val data="some variable"
df.filter("column1"> data)
I can do this with a static value but cant figure out how to do filter by a variable.
In Spark & PySpark, contains() function is used to match a column value contains in a literal string (matches on part of the string), this is mostly used to filter rows on DataFrame.
You can select the single or multiple columns of the DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with selected columns. show() function is used to show the Dataframe contents.
The Spark where() function is defined to filter rows from the DataFrame or the Dataset based on the given one or multiple conditions or SQL expression. The where() operator can be used instead of the filter when the user has the SQL background. Both the where() and filter() functions operate precisely the same.
import org.apache.spark.sql.functions._
val data="some variable"
df.filter(col("column1") > lit(data))
I'm not sure how you accomplished that with a literal either since what you have doesn't match any of the filter
method signatures.
So yes, you can work with a non-literal, but try this:
import sparkSession.implicits._
df.filter($"column1" > data)
Note the $
, which uses implicit
conversion to turn the String
into the Column
named with that String
. Meanwhile, this Column
has a >
method that takes an Any
and returns a new Column
. That Any
will be your data
value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With