I have a dataframe similar to the following:
import pandas as pd
data = {"Name":["Andrew","Andrew","Andrew","Andrew","Andrew","Andrew","Andrew", "Sam", "Sam", "Sam", "Sam", "Sam"], "PASS":[0, 1, 1, 0, 1, 1, 1, 0, 1, 1,0,1]}
df = pd.DataFrame(data=data)
Output
Name PASS
0 Andrew 0
1 Andrew 1
2 Andrew 1
3 Andrew 0
4 Andrew 1
5 Andrew 1
6 Andrew 1
7 Sam 0
8 Sam 1
9 Sam 1
10 Sam 0
11 Sam 1
I want to generate a dataframe which contains the largest consecutive passes for each student:
Name MAX_PASS
0 Andrew 3
1 Sam 2
I need a little help modifying the code that I have so far. The count
is outputting 0110110110
and the result = 2
. Which is not quite correct. I think I'm close but need some help to get over the finish line. Thanks.
count = ''
for i in range(len(df)-1):
if df.Name[i] == df.Name[i+1]:
if df.PASS[i] == 0:
count += "0"
else:
count += "1"
result = len(max(count.split('0')))
You can consider to adapt this answer
def max_strike_group(x, col):
x = x[col]
a = x != 0
out = a.cumsum()-a.cumsum().where(~a).ffill().fillna(0).astype(int)
return out.max()
df.groupby("Name").apply(lambda x:max_strike_group(x, "PASS"))
Name
Andrew 3
Sam 2
dtype: int64
One option is to call on cumsum twice, the first time is to add up the 0s and 1s, the second time is to get the values after the reset:
TL-DR:
cum1 = df.groupby('Name').PASS.cumsum()
cum1 = np.where(cum1.shift() == cum1, cum1 * -1, df.PASS)
(df.assign(PASS = cum1,
max_pass = lambda df: df.groupby('Name').cumsum())
.groupby('Name')
.max_pass
.max()
)
Name
Andrew 3
Sam 2
Name: max_pass, dtype: int64
Explanation:
# first cumulative sum
cum1 = df.groupby('Name').PASS.cumsum()
cum1
0 0
1 1
2 2
3 2
4 3
5 4
6 5
7 0
8 1
9 2
10 2
11 3
Name: PASS, dtype: int64
# look for rows where the reset should occur
cum1 = np.where(cum1.shift() == cum1, cum1 * -1, df.PASS)
cum1
array([ 0, 1, 1, -2, 1, 1, 1, 0, 1, 1, -2, 1])
# build the max_pass column
# with the second cumsum and groupby
# before grouping again to get the max
(df.assign(PASS = cum1,
max_pass = lambda df: df.groupby('Name').cumsum())
.groupby('Name')
.max_pass
.max()
)
Name
Andrew 3
Sam 2
Name: max_pass, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With