Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace NULL to 0 in left outer join in SPARK dataframe v1.6

I am working Spark v1.6. I have the following two DataFrames and I want to convert the null to 0 in my left outer join ResultSet. Any suggestions?

DataFrames

val x: Array[Int] = Array(1,2,3)
val df_sample_x = sc.parallelize(x).toDF("x")

val y: Array[Int] = Array(3,4,5)
val df_sample_y = sc.parallelize(y).toDF("y")

Left Outer Join

val df_sample_join = df_sample_x
  .join(df_sample_y,df_sample_x("x") === df_sample_y("y"),"left_outer")

ResultSet

scala> df_sample_join.show

x  |  y
--------
1  |  null

2  |  null

3  |  3

But I want the resultset to be displayed as.
-----------------------------------------------

scala> df_sample_join.show

x  |  y
--------
1  |  0

2  |  0

3  |  3
like image 464
Prasan Avatar asked Nov 23 '16 18:11

Prasan


1 Answers

Just use na.fill:

df.na.fill(0, Seq("y"))
like image 108
2 revsuser6022341 Avatar answered Nov 13 '22 21:11

2 revsuser6022341