How to drop rows in Pandas dataframe by multiple criteria imposed on two columns?

Question

Here's a toy example that captures my problem. Any help please? Thanks!

d = {'a': [1,1,1,2,2,2,3,3,3],
     'b': [1,2,3,1,2,3,1,2,3]}

df = pd.DataFrame(d)

Aiming for this result:

I want to drop two rows with (a,b) = (1,3) or (2,1).

result = pd.DataFrame({'a': [1,1,2,2,3,3,3],
                       'b': [1,2,2,3,1,2,3]})

In reality, I would have an exclusion list that will be updated with time: excl = [[1,3],[2,1],[3,4],........]

DSM · Accepted Answer

This feels like firing a cannon when we should be able to just wave our hands, but:

df = pd.DataFrame({'a': [1,1,1,1,2,2,2,3,3,3],
                   'b': [1,1,2,3,1,2,3,1,2,3]})

excl = [[1, 3], [2, 1]]
keep = df.merge(pd.DataFrame(excl, columns=['a','b']),
                how='left', indicator=True)._merge == 'left_only'

gives me

In [91]: df.loc[keep]
Out[91]: 
   a  b
0  1  1
1  1  1
2  1  2
5  2  2
6  2  3
7  3  1
8  3  2
9  3  3

(Note I added a duplicate 1,1 row for sanity purposes.)

Crazy method #2: use (effectively) a categorical encoding:

codes = pd.concat([df, edf], sort=False).groupby(["a","b"]).ngroup()
keep = ~codes.iloc[:len(df)].isin(codes.iloc[len(df):])
df = df.loc[keep]

DYZ · Answer

Convert the list of "forbidden" rows into a dataframe with the column names different from the original dataframe:

to_drop = pd.DataFrame(excl, columns=('c','d')) # Different column names!

Merge the two dataframes. There will be NaNs where there is a mismatch:

combined = df.merge(to_drop, how='outer', left_on=['a','b'], right_on=['c','d'])

Take any column originally from the second dataframe, find out where the NaNs are, and use their indexes to extract valid rows from the first dataframe:

df[combined.isnull()['d']]
#   a  b
#0  1  1
#1  1  2
#4  2  2
#5  2  3
#6  3  1
#7  3  2
#8  3  3

You may see a warning:

UserWarning: Boolean Series key will be reindexed to match DataFrame index.

You can disregard it for now.

How to drop rows in Pandas dataframe by multiple criteria imposed on two columns?

Tags:

python

pandas

Kelvin Yuen

2 Answers

DSM

DYZ

Recent Activity

Donate For Us

How to drop rows in Pandas dataframe by multiple criteria imposed on two columns?

Tags:

python

pandas

Kelvin Yuen

2 Answers

DSM

DYZ

Related questions

Recent Activity

Donate For Us