Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Equivalent to left outer join in SPARK

Is there a left outer join equivalent in SPARK SCALA ? I understand there is join operation which is equivalent to database inner join.

like image 390
user3279189 Avatar asked Apr 21 '14 08:04

user3279189


People also ask

What is left outer join in Spark?

Left outer joins will produce a table with all of the keys from the left table, and any rows without matching keys in the right table will have null values in the fields that would be populated by the right table.

What is left anti join PySpark?

The left anti join in PySpark is similar to the join functionality, but it returns only columns from the left DataFrame for non-matched records.

What is === in PySpark?

The triple equals operator === is normally the Scala type-safe equals operator, analogous to the one in Javascript. Spark overrides this with a method in Column to create a new Column object that compares the Column to the left with the object on the right, returning a boolean.


2 Answers

Spark Scala does have the support of left outer join. Have a look here http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.api.java.JavaPairRDD

Usage is quite simple as

rdd1.leftOuterJoin(rdd2)
like image 171
MARK Avatar answered Sep 26 '22 19:09

MARK


It is as simple as rdd1.leftOuterJoin(rdd2) but you have to make sure both rdd's are in the form of (key, value) for each element of the rdd's.

like image 34
Thang Tran Avatar answered Sep 25 '22 19:09

Thang Tran