Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas left outer join exclusion

How can I do a left outer join, excluding the intersection, in Pandas?

I have 2 pandas dataframes

df1 = pd.DataFrame(data = {'col1' : ['finance', 'finance', 'finance', 'accounting', 'IT'], 'col2' : ['az', 'bh', '', '', '']}) 
df2 = pd.DataFrame(data = {'col1' : ['finance', 'finance', 'finance', 'finance', 'finance'], 'col2' : ['', 'az', '', '', '']})

df1

    col1    col2
0   finance az
1   finance bh
2   finance 
3   accounting  
4   IT  

df2

    col1    col2
0   finance 
1   finance az
2   finance 
3   finance 
4   finance 

As you can see the dataframe has blank values as well. I tried using the example and its not giving me the result I want.

common = df1.merge(df2,on=['col1','col2'])
df3=df1[(~df1.col1.isin(common.col1))&(~df1.col2.isin(common.col2))]

I want the output to look something like

    col1    col2
3   accounting  
4   IT  
like image 446
StatguyUser Avatar asked Jun 20 '17 17:06

StatguyUser


2 Answers

Pandas left outer join exclusion can be achieved by setting pandas merge's indicator=True. Then filter by the indicator in _merge column.

df=pd.merge(df1,df2[['col1']],on=['col1'],how="outer",indicator=True)
df=df[df['_merge']=='left_only']
# this following line is just formating
df = df.reset_index()[['col1', 'col2']] 

Output:

col1    col2
0   accounting  
1   IT  

==================================

====The following is an example showing the mechanism====

df1 = pd.DataFrame({'key1': ['0', '1'],
                     'key2': [-1, -1],
                     'A': ['A0', 'A1'],
                     })


df2 = pd.DataFrame({'key1': ['0', '1'],
                      'key2': [1, -1], 
                    'B': ['B0', 'B1']
                     })

:

df1

Output:

    A   key1    key2
0   A0  0       -1
1   A1  1       -1

:

df2

Output:

    B   key1    key2
0   B0  0       1
1   B1  1       -1

:

df=pd.merge(df1,df2,on=['key1','key2'],how="outer",indicator=True)

:

Output:

     A  key1    key2    B   _merge
0   A0  0   -1  NaN left_only
1   A1  1   -1  B1  both
2   NaN 0   1   B0  right_only

:With the above indicators in the _merge column. you can select rows in one dataframe but not in another.

df=df[df['_merge']=='left_only']
df

Output:

    A   key1    key2    B   _merge
0   A0  0   -1  NaN left_only
like image 92
Bin Avatar answered Oct 19 '22 11:10

Bin


A one liner for this based on Bin's answer may be:

df=pd.merge(df1,df2[['col1']],on=['col1'],how="outer",indicator=True).query('_merge=="left_only"')
like image 21
Aidan Wood Avatar answered Oct 19 '22 11:10

Aidan Wood