I need to implement NVL function in spark while joining two dataframes.
Input Dataframes :
ds1.show()
---------------
|key | Code |
---------------
|2 | DST |
|3 | CPT |
|null | DTS |
|5 | KTP |
---------------
ds2.show()
------------------
|key | PremAmt |
------------------
|2 | 300 |
|-1 | -99 |
|5 | 567 |
------------------
Need to implement "LEFT JOIN NVL(DS1.key, -1) = DS2.key" . So I have written like this, but NVL or Coalesce function is missing .so it returned wrong values.
How to incorporate "NVL" in spark dataframes ?
// nvl function is missing, so wrong output
ds1.join(ds1,Seq("key"),"left_outer")
-------------------------
|key | Code |PremAmt |
-------------------------
|2 | DST |300 |
|3 | CPT |null |
|null | DTS |null |
|5 | KTP |567 |
-------------------------
Expected Result :
-------------------------
|key | Code |PremAmt |
-------------------------
|2 | DST |300 |
|3 | CPT |null |
|null | DTS |-99 |
|5 | KTP |567 |
-------------------------
I know one complex way.
val df = df1.join(df2, coalesce(df1("key"), lit(-1)) === df2("key"), "left_outer")
You should rename column name "key" of one df, and drop the column after join.
import org.apache.spark.sql.Column;
import org.apache.spark.sql.functions.{when, lit};
def nvl(ColIn: Column, ReplaceVal: Any): Column = {
return(when(ColIn.isNull, lit(ReplaceVal)).otherwise(ColIn))
}
Now you can use nvl as you would use any other function for data frame manipulation, like
val NewDf = DF.withColumn("MyColNullsReplaced", nvl($"MyCol", "<null>"))
Obviously, Replaceval
must be of the correct type. The example above assumes $"MyCol"
is of type string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With