Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Selecting rows for which groupby.sum() satisfies condition

In pandas I have a dataframe of the form:

>>> import pandas as pd  
>>> df = pd.DataFrame({'ID':[51,51,51,24,24,24,31], 'x':[0,1,0,0,1,1,0]})
>>> df

ID   x
51   0
51   1
51   0
24   0
24   1
24   1
31   0

For every 'ID' the value of 'x' is recorded several times, it is either 0 or 1. I want to select those rows from df that contain an 'ID' for which 'x' is 1 at least twice.

For every 'ID' I manage to count the number of times 'x' is 1, by

>>> df.groupby('ID')['x'].sum()

ID
51    1
24    2
31    0

But I don't know how to proceed from here. I would like the following output:

ID   x
24   0
24   1
24   1
like image 793
DominikS Avatar asked Jun 13 '17 21:06

DominikS


2 Answers

Use groupby and filter

df.groupby('ID').filter(lambda s: s.x.sum()>=2)

Output:

   ID  x
3  24  0
4  24  1
5  24  1
like image 168
Scott Boston Avatar answered Oct 13 '22 00:10

Scott Boston


df = pd.DataFrame({'ID':[51,51,51,24,24,24,31], 'x':[0,1,0,0,1,1,0]})
df.loc[df.groupby(['ID'])['x'].transform(func=sum)>=2,:]
out:
   ID  x
3  24  0
4  24  1
5  24  1
like image 40
BENY Avatar answered Oct 12 '22 23:10

BENY