Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Except function with spark Dataframe

I would like to get differences between two dataframe but returning the row with the different fields only. For example, I have 2 dataframes as follow:

val DF1 = Seq(
    (3,"Chennai",  "rahman",9846, 45000,"SanRamon"),
    (1,"Hyderabad","ram",9847, 50000,"SF")
).toDF("emp_id","emp_city","emp_name","emp_phone","emp_sal","emp_site")

val DF2 = Seq(
    (3,"Chennai",  "rahman",9846, 45000,"SanRamon"),
    (1,"Sydney","ram",9847, 48000,"SF")
).toDF("emp_id","emp_city","emp_name","emp_phone","emp_sal","emp_site")

The only difference between these two dataframe is emp_city and emp_sal for the second row. Now, I am using the except function which gives me the entire row as follow:

DF1.except(DF2)

+------+---------+--------+---------+-------+--------+
|emp_id| emp_city|emp_name|emp_phone|emp_sal|emp_site|
+------+---------+--------+---------+-------+--------+
|     1|Hyderabad|     ram|     9847|  50000|      SF|
+------+---------+--------+---------+-------+--------+

However, I need the output to be like this:

+---------+--------+-----+
|emp_id| emp_city|emp_sal|
+------+---------+-------+
|     1|Hyderabad|  50000| 
+------+---------+-------+

Which shows the different cells as well as emp_id.

Edit : if there is change in column then it should appear if there is no change then it should be hidden or Null

like image 314
milad ahmadi Avatar asked Sep 03 '25 09:09

milad ahmadi


1 Answers

The following should give you the result you are looking for.

DF1.except(DF2).select("emp_id","emp_city","emp_sal")

like image 187
Trung Nguyen Avatar answered Sep 04 '25 21:09

Trung Nguyen