In pandas I have a dataframe of the form:
>>> import pandas as pd
>>> df = pd.DataFrame({'ID':[51,51,51,24,24,24,31], 'x':[0,1,0,0,1,1,0]})
>>> df
ID x
51 0
51 1
51 0
24 0
24 1
24 1
31 0
For every 'ID' the value of 'x' is recorded several times, it is either 0 or 1. I want to select those rows from df
that contain an 'ID' for which 'x' is 1 at least twice.
For every 'ID' I manage to count the number of times 'x' is 1, by
>>> df.groupby('ID')['x'].sum()
ID
51 1
24 2
31 0
But I don't know how to proceed from here. I would like the following output:
ID x
24 0
24 1
24 1
Use groupby
and filter
df.groupby('ID').filter(lambda s: s.x.sum()>=2)
Output:
ID x
3 24 0
4 24 1
5 24 1
df = pd.DataFrame({'ID':[51,51,51,24,24,24,31], 'x':[0,1,0,0,1,1,0]})
df.loc[df.groupby(['ID'])['x'].transform(func=sum)>=2,:]
out:
ID x
3 24 0
4 24 1
5 24 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With