Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Joining two DataFrames in Spark SQL and selecting columns of only one

I have two DataFrames in Spark SQL (D1 and D2).

I am trying to inner join both of them D1.join(D2, "some column") and get back data of only D1, not the complete data set.

Both D1 and D2 are having the same columns.

Could some one please help me on this?

I am using Spark 1.6.

like image 462
Avi Avatar asked Aug 02 '16 13:08

Avi


People also ask

How do I select specific columns in Spark DataFrame?

You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.

How do I merge two DataFrames with different columns in Spark?

Here In first dataframe (dataframe1) , the columns ['ID', 'NAME', 'Address'] and second dataframe (dataframe2 ) columns are ['ID','Age']. Now we have to add the Age column to the first dataframe and NAME and Address in the second dataframe, we can do this by using lit() function. This function is available in pyspark.


3 Answers

Let say you want to join on "id" column. Then you could write :

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._    
d1.as("d1").join(d2.as("d2"), $"d1.id" === $"d2.id").select($"d1.*")
like image 200
cheseaux Avatar answered Oct 05 '22 15:10

cheseaux


As an alternate answer, you could also do the following without adding aliases:

d1.join(d2, d1("id") === d2("id"))
  .select(d1.columns.map(c => d1(c)): _*)
like image 25
nsanglar Avatar answered Oct 05 '22 16:10

nsanglar


You could use left_semi:

d1.as("d1").join(d2.as("d2"), $"d1.id" === $"d2.id", "left_semi")

Semi-join takes only rows from the left dataset where joining condition is met.

There's also another interesting join type: left_anti, which works similarily to left_semi but takes only those rows where the condition is not met.

like image 34
Krzysztof Atłasik Avatar answered Oct 05 '22 15:10

Krzysztof Atłasik