I have these 2 data frames:
df_test
  dimension1_id dimension2_id dimension3_id dimension4_id dimension5_id  \
0            -1            -1            -1            -1            -1   
1    1177314888     238198786    5770904146     133207291         Exact   
2    1177314888     238198786    5770904266   18395155770         Exact   
3    1177314888     238198786    5770904266   19338210057         Exact   
4    1177314888     238198786    5770904266   30907903234         Exact 
and
df_merge
dimension1_id dimension2_id dimension3_id dimension4_id dimension5_id  \
0            -1            -1            -1            -1            -1   
1    1177314888     238198786    5770904146     133207291         Exact 
I want to remove everything that is inside df_merge from df_test, based on the combinations of dimension1_id, dimension2_id, dimension3_id, dimension4_id and dimension5_id.
This is my code:
df_test = df_test[
(df_test['dimension5_id'].isin(df_merge.dimension5_id) == False) &
(df_test['dimension4_id'].isin(df_merge.dimension4_id) == False) &                              (df_test['dimension3_id'].isin(df_merge.dimension3_id) == False) &                                (df_test['dimension2_id'].isin(df_merge.dimension2_id) == False) &
(df_test['dimension1_id'].isin(df_merge.dimension1_id) == False) 
 ]
But this code returns a empty data frame. How can I just remove the first and second line from df_test?
You can use logical indexing to mask the rows you want by applying a direct comparison. In this case, you can check for values in df_test which are in df_merge:
df_test.isin(df_merge)
The resulting logical index acts as a mask:
dimension1_id dimension2_id dimension3_id dimension4_id dimension5_id      \
0          True          True          True          True          True   True
1          True          True          True          True          True   True
2         False         False         False         False         False  False
3         False         False         False         False         False  False
4         False         False         False         False         False  False
True values map to matching rows so we can simply negate the index using ~ to return only the rows you in df_merge which are not df_test:
df_test[~df_test.isin(df_merge)]
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With