Is there a left outer join equivalent in SPARK SCALA ? I understand there is join operation which is equivalent to database inner join.
Left outer joins will produce a table with all of the keys from the left table, and any rows without matching keys in the right table will have null values in the fields that would be populated by the right table.
The left anti join in PySpark is similar to the join functionality, but it returns only columns from the left DataFrame for non-matched records.
The triple equals operator === is normally the Scala type-safe equals operator, analogous to the one in Javascript. Spark overrides this with a method in Column to create a new Column object that compares the Column to the left with the object on the right, returning a boolean.
Spark Scala does have the support of left outer join. Have a look here http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.api.java.JavaPairRDD
Usage is quite simple as
rdd1.leftOuterJoin(rdd2)
It is as simple as rdd1.leftOuterJoin(rdd2)
but you have to make sure both rdd's are in the form of (key, value) for each element of the rdd's.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With