What is the corrent syntax for filtering on multiple columns in the Scala API? If I want to do something like this:
dataFrame.filter($"col01" === "something" && $"col02" === "something else")
or
dataFrame.filter($"col01" === "something" || $"col02" === "something else")
EDIT:
This is what my original code looks like. Everything comes in as a string.
df.select($"userID" as "user", $"itemID" as "item", $"quantity" cast("int"), $"price" cast("float"), $"discount" cast ("float"), sqlf.substring($"datetime", 0, 10) as "date", $"group")
.filter($"item" !== "" && $"group" !== "-1")
Method 1: Using filter() Method filter() is used to return the dataframe based on the given condition by removing the rows in the dataframe or by extracting the particular rows or columns from the dataframe. We are going to filter the dataframe on multiple columns. It can take a condition and returns the dataframe.
PySpark Filter with Multiple Conditions In PySpark, to filter() rows on DataFrame based on multiple conditions, you case use either Column with a condition or SQL expression. Below is just a simple example using AND (&) condition, you can extend this with OR(|), and NOT(!) conditional expressions as needed.
Select Single & Multiple Columns From PySpark You can select the single or multiple columns of the DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with selected columns.
To subset or filter the data from the dataframe we are using the filter() function. The filter function is used to filter the data from the dataframe on the basis of the given condition it should be single or multiple. where df is the dataframe from which the data is subset or filtered.
I think i see what the issue is. For some reason, spark does not allow two !='s in the same filter. Need to look at how filter is defined in Spark source code.
Now for your code to work, you can use this to do the filter
df.filter(col("item").notEqual("") && col("group").notEqual("-1"))
or use two filters in same statement
df.filter($"item" !== "").filter($"group" !== "-1").select(....)
This link here can help with different spark methods.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With