I wanted to evaluate two conditions in when like this :-
import pyspark.sql.functions as F df = df.withColumn( 'trueVal', F.when(df.value < 1 OR df.value2 == 'false' , 0 ).otherwise(df.value))
For this I get 'invalid syntax' for using 'OR'
Even I tried using nested when statements :-
df = df.withColumn( 'v', F.when(df.value < 1,(F.when( df.value =1,0).otherwise(df.value))).otherwise(df.value) )
For this i get 'keyword can't be an expression'
for nested when statements.
How could I use multiple conditions in when
any work around ?
PySpark When Otherwise – when() is a SQL function that returns a Column type and otherwise() is a function of Column, if otherwise() is not used, it returns a None/NULL value. PySpark SQL Case When – This is similar to SQL expression, Usage: CASE WHEN cond1 THEN result WHEN cond2 THEN result... ELSE result END .
Both 'filter' and 'where' in Spark SQL gives same result. There is no difference between the two. It's just filter is simply the standard Scala name for such a function, and where is for people who prefer SQL.
Spark filter() or where() function is used to filter the rows from DataFrame or Dataset based on the given one or multiple conditions or SQL expression. You can use where() operator instead of the filter if you are coming from SQL background. Both these functions operate exactly the same.
Like SQL "case when" statement and “ Swith" , "if then else" statement from popular programming languages, Spark SQL Dataframe also supports similar syntax using “ when otherwise ” or we can also use “ case when ” statement.
pyspark.sql.DataFrame.where
takes a Boolean Column as its condition. When using PySpark, it's often useful to think "Column Expression" when you read "Column".
Logical operations on PySpark columns use the bitwise operators:
&
for and
|
for or
~
for not
When combining these with comparison operators such as <
, parenthesis are often needed.
In your case, the correct statement is:
import pyspark.sql.functions as F df = df.withColumn('trueVal', F.when((df.value < 1) | (df.value2 == 'false'), 0).otherwise(df.value))
See also: SPARK-8568
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With