Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter Spark Dataframe with a variable

Is this even possible in spark dataframe (1.6/2.1)

val data="some variable"

df.filter("column1"> data)

I can do this with a static value but cant figure out how to do filter by a variable.

like image 394
Goby Bala Avatar asked Apr 23 '17 14:04

Goby Bala


People also ask

How do you check if a column contains a particular value in PySpark?

In Spark & PySpark, contains() function is used to match a column value contains in a literal string (matches on part of the string), this is mostly used to filter rows on DataFrame.

How do I select a specific column in PySpark?

You can select the single or multiple columns of the DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with selected columns. show() function is used to show the Dataframe contents.

What is difference between filter and where in Spark DataFrame?

The Spark where() function is defined to filter rows from the DataFrame or the Dataset based on the given one or multiple conditions or SQL expression. The where() operator can be used instead of the filter when the user has the SQL background. Both the where() and filter() functions operate precisely the same.


2 Answers

import org.apache.spark.sql.functions._

val data="some variable"
df.filter(col("column1") > lit(data))
like image 135
pasha701 Avatar answered Sep 28 '22 00:09

pasha701


I'm not sure how you accomplished that with a literal either since what you have doesn't match any of the filter method signatures.

So yes, you can work with a non-literal, but try this:

import sparkSession.implicits._
df.filter($"column1" > data)

Note the $, which uses implicit conversion to turn the String into the Column named with that String. Meanwhile, this Column has a > method that takes an Any and returns a new Column. That Any will be your data value.

like image 34
Vidya Avatar answered Sep 27 '22 23:09

Vidya