Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join two data frames, select all columns from one and some columns from the other

Let's say I have a spark data frame df1, with several columns (among which the column id) and data frame df2 with two columns, id and other.

Is there a way to replicate the following command:

sqlContext.sql("SELECT df1.*, df2.other FROM df1 JOIN df2 ON df1.id = df2.id") 

by using only pyspark functions such as join(), select() and the like?

I have to implement this join in a function and I don't want to be forced to have sqlContext as a function parameter.

like image 204
Francesco Sambo Avatar asked Mar 21 '16 13:03

Francesco Sambo


People also ask

How do you join two DataFrames in PySpark with different column names?

Here In first dataframe (dataframe1) , the columns ['ID', 'NAME', 'Address'] and second dataframe (dataframe2 ) columns are ['ID','Age']. Now we have to add the Age column to the first dataframe and NAME and Address in the second dataframe, we can do this by using lit() function. This function is available in pyspark.

How do I select all columns from a DataFrame in spark?

You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.


2 Answers

Asterisk (*) works with alias. Ex:

from pyspark.sql.functions import *  df1 = df1.alias('df1') df2 = df2.alias('df2')  df1.join(df2, df1.id == df2.id).select('df1.*') 
like image 194
maxcnunes Avatar answered Sep 22 '22 06:09

maxcnunes


Not sure if the most efficient way, but this worked for me:

from pyspark.sql.functions import col  df1.alias('a').join(df2.alias('b'),col('b.id') == col('a.id')).select([col('a.'+xx) for xx in a.columns] + [col('b.other1'),col('b.other2')]) 

The trick is in:

[col('a.'+xx) for xx in a.columns] : all columns in a  [col('b.other1'),col('b.other2')] : some columns of b 
like image 45
Pablo Estevez Avatar answered Sep 24 '22 06:09

Pablo Estevez