Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

subtract two columns with null in spark dataframe

I new to spark, I have dataframe df:

+----------+------------+-----------+
| Column1  | Column2    | Sub       |                          
+----------+------------+-----------+
| 1        | 2          | 1         |                                         
+----------+------------+-----------+
| 4        | null       | null      |                          
+----------+------------+-----------+
| 5        | null       | null      |                          
+----------+------------+-----------+
| 6        | 8          | 2         |                          
+----------+------------+-----------+

when subtracting two columns, one column has null so resulting column also resulting as null.

df.withColumn("Sub", col(A)-col(B))

Expected output should be:

+----------+------------+-----------+
|  Column1 | Column2    | Sub       |                          
+----------+------------+-----------+
| 1        | 2          | 1         |                                           
+----------+------------+-----------+
| 4        | null       | 4         |                          
+----------+------------+-----------+
| 5        | null       | 5         |                          
+----------+------------+-----------+
| 6        | 8          | 2         |                          
+----------+------------+-----------+

I don't want to replace the column2 to replace with 0, it should be null only. Can someone help me on this?

like image 687
warner Avatar asked Dec 11 '22 09:12

warner


2 Answers

You can use when function as

import org.apache.spark.sql.functions._
df.withColumn("Sub", when(col("Column1").isNull(), lit(0)).otherwise(col("Column1")) - when(col("Column2").isNull(), lit(0)).otherwise(col("Column2")))

you should have final result as

+-------+-------+----+
|Column1|Column2| Sub|
+-------+-------+----+
|      1|      2|-1.0|
|      4|   null| 4.0|
|      5|   null| 5.0|
|      6|      8|-2.0|
+-------+-------+----+
like image 113
Ramesh Maharjan Avatar answered Feb 01 '23 22:02

Ramesh Maharjan


You can coalesce nulls to zero on both columns and then do the subtraction:

val df = Seq((Some(1), Some(2)), 
             (Some(4), null), 
             (Some(5), null), 
             (Some(6), Some(8))
            ).toDF("A", "B")

df.withColumn("Sub", abs(coalesce($"A", lit(0)) - coalesce($"B", lit(0)))).show
+---+----+---+
|  A|   B|Sub|
+---+----+---+
|  1|   2|  1|
|  4|null|  4|
|  5|null|  5|
|  6|   8|  2|
+---+----+---+
like image 27
Psidom Avatar answered Feb 01 '23 22:02

Psidom