Currently I'm working with weekly data for different subjects, but it might have some long streaks without data, so, what I want to do, is to just keep the longest streak of consecutive weeks for every id
. My data looks like this:
id week
1 8
1 15
1 60
1 61
1 62
2 10
2 11
2 12
2 13
2 25
2 26
My expected output would be:
id week
1 60
1 61
1 62
2 10
2 11
2 12
2 13
I got a bit close, trying to mark with a 1 when week
==week.shift()+1
. The problem is this approach doesn't mark the first occurrence in a streak, and also I can't filter the longest one:
df.loc[ (df['id'] == df['id'].shift())&(df['week'] == df['week'].shift()+1),'streak']=1
This, according to my example, would bring this:
id week streak
1 8 nan
1 15 nan
1 60 nan
1 61 1
1 62 1
2 10 nan
2 11 1
2 12 1
2 13 1
2 25 nan
2 26 1
Any ideas on how to achieve what I want?
Try this:
df['consec'] = df.groupby(['id',df['week'].diff(-1).ne(-1).shift().bfill().cumsum()]).transform('count')
df[df.groupby('id')['consec'].transform('max') == df.consec]
Output:
id week consec
2 1 60 3
3 1 61 3
4 1 62 3
5 2 10 4
6 2 11 4
7 2 12 4
8 2 13 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With