In pandas I have a dataframe of the form:
>>> import pandas as pd  
>>> df = pd.DataFrame({'ID':[51,51,51,24,24,24,31], 'x':[0,1,0,0,1,1,0]})
>>> df
ID   x
51   0
51   1
51   0
24   0
24   1
24   1
31   0
For every 'ID' the value of 'x' is recorded several times, it is either 0 or 1. I want to select those rows from df that contain an 'ID' for which 'x' is 1 at least twice. 
For every 'ID' I manage to count the number of times 'x' is 1, by
>>> df.groupby('ID')['x'].sum()
ID
51    1
24    2
31    0
But I don't know how to proceed from here. I would like the following output:
ID   x
24   0
24   1
24   1
                Use groupby and filter
df.groupby('ID').filter(lambda s: s.x.sum()>=2)
Output:
   ID  x
3  24  0
4  24  1
5  24  1
                        df = pd.DataFrame({'ID':[51,51,51,24,24,24,31], 'x':[0,1,0,0,1,1,0]})
df.loc[df.groupby(['ID'])['x'].transform(func=sum)>=2,:]
out:
   ID  x
3  24  0
4  24  1
5  24  1
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With