Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to subtract rows of one pandas data frame from another?

The operation that I want to do is similar to merger. For example, with the inner merger we get a data frame that contains rows that are present in the first AND second data frame. With the outer merger we get a data frame that are present EITHER in the first OR in the second data frame.

What I need is a data frame that contains rows that are present in the first data frame AND NOT present in the second one? Is there a fast and elegant way to do it?

like image 733
Roman Avatar asked Apr 25 '14 04:04

Roman


2 Answers

Consider Following:

  1. df_one is first DataFrame
  2. df_two is second DataFrame

Present in First DataFrame and Not in Second DataFrame

Solution: by Index df = df_one[~df_one.index.isin(df_two.index)]

index can be replaced by required column upon which you wish to do exclusion. In above example, I've used index as a reference between both Data Frames

Additionally, you can also use a more complex query using boolean pandas.Series to solve for above.

like image 166
Chirag Chhatbar Avatar answered Sep 29 '22 19:09

Chirag Chhatbar


How about something like the following?

print df1

    Team  Year  foo
0   Hawks  2001    5
1   Hawks  2004    4
2    Nets  1987    3
3    Nets  1988    6
4    Nets  2001    8
5    Nets  2000   10
6    Heat  2004    6
7  Pacers  2003   12

print df2

    Team  Year  foo
0  Pacers  2003   12
1    Heat  2004    6
2    Nets  1988    6

As long as there is a non-key commonly named column, you can let the added on sufffexes do the work (if there is no non-key common column then you could create one to use temporarily ... df1['common'] = 1 and df2['common'] = 1):

new = df1.merge(df2,on=['Team','Year'],how='left')
print new[new.foo_y.isnull()]

     Team  Year  foo_x  foo_y
0  Hawks  2001      5    NaN
1  Hawks  2004      4    NaN
2   Nets  1987      3    NaN
4   Nets  2001      8    NaN
5   Nets  2000     10    NaN

Or you can use isin but you would have to create a single key:

df1['key'] = df1['Team'] + df1['Year'].astype(str)
df2['key'] = df1['Team'] + df2['Year'].astype(str)
print df1[~df1.key.isin(df2.key)]

     Team  Year  foo         key
0   Hawks  2001    5   Hawks2001
2    Nets  1987    3    Nets1987
4    Nets  2001    8    Nets2001
5    Nets  2000   10    Nets2000
6    Heat  2004    6    Heat2004
7  Pacers  2003   12  Pacers2003
like image 36
Karl D. Avatar answered Sep 29 '22 19:09

Karl D.