Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete pandas group based on condition

I have a pandas dataframe in with several groups and I would like to exclude groups where some conditions (in a specific column) are not met. E.g. delete here group B because they have a non-number value in column "crit1".

I could delete specific columns based on the condition df.loc[:, (df >< 0).any(axis=0)] but then it doesn't delete the whole group.

And somehow I can't make the next step and apply this to the whole group.

name    crit1   crit2
A       0.3     4
A       0.7     6
B       inf     4
B       0.4     3 

So the result after this filtering (allow only floats) should be:

A     0.3     4
A     0.7     6
like image 215
Don Avatar asked Aug 18 '16 13:08

Don


1 Answers

You can use groupby and filter, for the example you give you can check if np.inf exists in a group and filter on the condition:

import pandas as pd
import numpy as np
df.groupby('name').filter(lambda g: (g != np.inf).all().all())
#   name   crit1    crit2
# 0    A     0.3        4
# 1    A     0.7        6

If the predicate only applies to one column, you can access the column via g., for example:

df.groupby('name').filter(lambda g: (g.crit1 != np.inf).all())
#   name   crit1    crit2
# 0    A     0.3        4
# 1    A     0.7        6
like image 118
Psidom Avatar answered Nov 09 '22 08:11

Psidom