Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PySpark: when function with multiple outputs [duplicate]

I am trying to use a "chained when" function. In other words, I'd like to get more than two outputs.

I tried using the same logic of the concatenate IF function in Excel:

  df.withColumn("device_id", when(col("device")=="desktop",1)).otherwise(when(col("device")=="mobile",2)).otherwise(null))

But that doesn't work since I can't put a tuple into the "otherwise" function.

like image 467
Fede Avatar asked Mar 01 '17 16:03

Fede


Video Answer


1 Answers

Have you tried:

from pyspark.sql import functions as F
df.withColumn('device_id', F.when(col('device')=='desktop', 1).when(col('device')=='mobile', 2).otherwise(None))

Note that when chaining when functions you do not need to wrap the successive calls in an otherwise function.

like image 151
Grr Avatar answered Oct 04 '22 22:10

Grr