I have these 2 data frames:
df_test
dimension1_id dimension2_id dimension3_id dimension4_id dimension5_id \
0 -1 -1 -1 -1 -1
1 1177314888 238198786 5770904146 133207291 Exact
2 1177314888 238198786 5770904266 18395155770 Exact
3 1177314888 238198786 5770904266 19338210057 Exact
4 1177314888 238198786 5770904266 30907903234 Exact
and
df_merge
dimension1_id dimension2_id dimension3_id dimension4_id dimension5_id \
0 -1 -1 -1 -1 -1
1 1177314888 238198786 5770904146 133207291 Exact
I want to remove everything that is inside df_merge
from df_test
, based on the combinations of dimension1_id
, dimension2_id
, dimension3_id
, dimension4_id
and dimension5_id
.
This is my code:
df_test = df_test[
(df_test['dimension5_id'].isin(df_merge.dimension5_id) == False) &
(df_test['dimension4_id'].isin(df_merge.dimension4_id) == False) & (df_test['dimension3_id'].isin(df_merge.dimension3_id) == False) & (df_test['dimension2_id'].isin(df_merge.dimension2_id) == False) &
(df_test['dimension1_id'].isin(df_merge.dimension1_id) == False)
]
But this code returns a empty data frame. How can I just remove the first and second line from df_test?
You can use logical indexing to mask the rows you want by applying a direct comparison. In this case, you can check for values in df_test
which are in df_merge
:
df_test.isin(df_merge)
The resulting logical index acts as a mask:
dimension1_id dimension2_id dimension3_id dimension4_id dimension5_id \
0 True True True True True True
1 True True True True True True
2 False False False False False False
3 False False False False False False
4 False False False False False False
True
values map to matching rows so we can simply negate the index using ~
to return only the rows you in df_merge
which are not df_test
:
df_test[~df_test.isin(df_merge)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With