My dataframe looks like
ID colA
1 B
1 D
2 B
2 D
2 C
I have return all rows after the last occurrence of event B in each group. The output will be :
ID colA
1 D
2 D
2 C
I tried
a = df['colA'].str.contains('B').groupby(df['ID'])
b = df[(a.transform('sum') - a.cumsum()).eq(0)]
and it's working fine so far. I am just wondering if there is any alternative approach to achieve this?
Reverse your rows (this is important). Then call groupby and cumsum, and take all rows with (reversed) cumsum value equal to zero.
df[df.colA.eq('B')[::-1].astype(int).groupby(df.ID).cumsum().eq(0)]
ID colA
1 1 D
3 2 D
4 2 C
You could do:
ix = (df.colA.eq('B')
.cumsum()
.groupby(df.ID)
.apply(lambda x: x.loc[x.idxmax()+1:]).index.get_level_values(1))
df.loc[ix,:]
ID colA
1 1 D
3 2 D
4 2 C
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With