Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas left outer join multiple dataframes on multiple columns

I am new to using DataFrame and I would like to know how to perform a SQL equivalent of left outer join on multiple columns on a series of tables

Example:

df1:  Year    Week    Colour    Val1  2014       A       Red      50 2014       B       Red      60 2014       B     Black      70 2014       C       Red      10 2014       D     Green      20  df2: Year    Week    Colour    Val2 2014       A     Black      30 2014       B     Black     100 2014       C     Green      50 2014       C       Red      20 2014       D       Red      40  df3: Year    Week    Colour    Val3 2013       B       Red      60 2013       C     Black      80 2013       B     Black      10 2013       D     Green      20 2013       D       Red      50 

Essentially I want to do something like this SQL code (Notice that df3 is not joined on Year):

SELECT df1.*, df2.Val2, df3.Val3 FROM df1   LEFT OUTER JOIN df2     ON df1.Year = df2.Year     AND df1.Week = df2.Week     AND df1.Colour = df2.Colour   LEFT OUTER JOIN df3     ON df1.Week = df3.Week     AND df1.Colour = df3.Colour 

The result should look like:

Year    Week    Colour    Val1    Val2    Val3 2014       A       Red      50    Null    Null 2014       B       Red      60    Null      60 2014       B     Black      70     100    Null 2014       C       Red      10      20    Null 2014       D     Green      20    Null    Null 

I have tried using merge and join but can't figure out how to do it on multiple tables and when there are multiple joints involved. Could someone help me on this please?

Thanks

like image 770
user3311225 Avatar asked Feb 14 '14 18:02

user3311225


People also ask

How do I join 3 tables in pandas?

We can use either pandas. merge() or DataFrame. merge() to merge multiple Dataframes. Merging multiple Dataframes is similar to SQL join and supports different types of join inner , left , right , outer , cross .

What is the output of left Joining below 2 tables in pandas?

A left join, or left merge, keeps every row from the left dataframe. Result from left-join or left-merge of two dataframes in Pandas. Rows in the left dataframe that have no corresponding join value in the right dataframe are left with NaN values.


2 Answers

Merge them in two steps, df1 and df2 first, and then the result of that to df3.

In [33]: s1 = pd.merge(df1, df2, how='left', on=['Year', 'Week', 'Colour']) 

I dropped year from df3 since you don't need it for the last join.

In [39]: df = pd.merge(s1, df3[['Week', 'Colour', 'Val3']],                        how='left', on=['Week', 'Colour'])  In [40]: df Out[40]:     Year Week Colour  Val1  Val2 Val3 0  2014    A    Red    50   NaN  NaN 1  2014    B    Red    60   NaN   60 2  2014    B  Black    70   100   10 3  2014    C    Red    10    20  NaN 4  2014    D  Green    20   NaN   20  [5 rows x 6 columns] 
like image 139
TomAugspurger Avatar answered Oct 03 '22 02:10

TomAugspurger


One can also do this with a compact version of @TomAugspurger's answer, like so:

df = df1.merge(df2, how='left', on=['Year', 'Week', 'Colour']).merge(df3[['Week', 'Colour', 'Val3']], how='left', on=['Week', 'Colour']) 
like image 33
FatihAkici Avatar answered Oct 03 '22 03:10

FatihAkici