Here's a toy example that captures my problem. Any help please? Thanks!
d = {'a': [1,1,1,2,2,2,3,3,3],
'b': [1,2,3,1,2,3,1,2,3]}
df = pd.DataFrame(d)
Aiming for this result:
I want to drop two rows with (a,b) = (1,3) or (2,1).
result = pd.DataFrame({'a': [1,1,2,2,3,3,3],
'b': [1,2,2,3,1,2,3]})
In reality, I would have an exclusion list that will be updated with time: excl = [[1,3],[2,1],[3,4],........]
This feels like firing a cannon when we should be able to just wave our hands, but:
df = pd.DataFrame({'a': [1,1,1,1,2,2,2,3,3,3],
'b': [1,1,2,3,1,2,3,1,2,3]})
excl = [[1, 3], [2, 1]]
keep = df.merge(pd.DataFrame(excl, columns=['a','b']),
how='left', indicator=True)._merge == 'left_only'
gives me
In [91]: df.loc[keep]
Out[91]:
a b
0 1 1
1 1 1
2 1 2
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
(Note I added a duplicate 1,1 row for sanity purposes.)
Crazy method #2: use (effectively) a categorical encoding:
codes = pd.concat([df, edf], sort=False).groupby(["a","b"]).ngroup()
keep = ~codes.iloc[:len(df)].isin(codes.iloc[len(df):])
df = df.loc[keep]
Convert the list of "forbidden" rows into a dataframe with the column names different from the original dataframe:
to_drop = pd.DataFrame(excl, columns=('c','d')) # Different column names!
Merge the two dataframes. There will be NaN
s where there is a mismatch:
combined = df.merge(to_drop, how='outer', left_on=['a','b'], right_on=['c','d'])
Take any column originally from the second dataframe, find out where the NaN
s are, and use their indexes to extract valid rows from the first dataframe:
df[combined.isnull()['d']]
# a b
#0 1 1
#1 1 2
#4 2 2
#5 2 3
#6 3 1
#7 3 2
#8 3 3
You may see a warning:
UserWarning: Boolean Series key will be reindexed to match DataFrame index.
You can disregard it for now.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With