Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing intersection between data frame based on multiple columns

Tags:

python

pandas

I have these 2 data frames:

df_test
  dimension1_id dimension2_id dimension3_id dimension4_id dimension5_id  \
0            -1            -1            -1            -1            -1   
1    1177314888     238198786    5770904146     133207291         Exact   
2    1177314888     238198786    5770904266   18395155770         Exact   
3    1177314888     238198786    5770904266   19338210057         Exact   
4    1177314888     238198786    5770904266   30907903234         Exact 

and

df_merge
dimension1_id dimension2_id dimension3_id dimension4_id dimension5_id  \
0            -1            -1            -1            -1            -1   
1    1177314888     238198786    5770904146     133207291         Exact 

I want to remove everything that is inside df_merge from df_test, based on the combinations of dimension1_id, dimension2_id, dimension3_id, dimension4_id and dimension5_id.

This is my code:

df_test = df_test[
(df_test['dimension5_id'].isin(df_merge.dimension5_id) == False) &
(df_test['dimension4_id'].isin(df_merge.dimension4_id) == False) &                              (df_test['dimension3_id'].isin(df_merge.dimension3_id) == False) &                                (df_test['dimension2_id'].isin(df_merge.dimension2_id) == False) &
(df_test['dimension1_id'].isin(df_merge.dimension1_id) == False) 
 ]

But this code returns a empty data frame. How can I just remove the first and second line from df_test?

like image 218
Filipe Ferminiano Avatar asked May 16 '16 04:05

Filipe Ferminiano


1 Answers

You can use logical indexing to mask the rows you want by applying a direct comparison. In this case, you can check for values in df_test which are in df_merge:

df_test.isin(df_merge)

The resulting logical index acts as a mask:

dimension1_id dimension2_id dimension3_id dimension4_id dimension5_id      \
0          True          True          True          True          True   True
1          True          True          True          True          True   True
2         False         False         False         False         False  False
3         False         False         False         False         False  False
4         False         False         False         False         False  False

True values map to matching rows so we can simply negate the index using ~ to return only the rows you in df_merge which are not df_test:

df_test[~df_test.isin(df_merge)]
like image 121
Anthony E Avatar answered Sep 28 '22 05:09

Anthony E