I'm trying to create a new dataframe column that acts as a running variable that resets to zero or "passes" under certain conditions. Below is a simplified example of what I'm looking to accomplish. Let's say I'm trying to quit drinking coffee and I'm tracking the number of days in a row i've gone without drinking any. On days where I forgot to make note of whether I drank coffee, I put "forgot", and my tally does not get influenced.
Below is how i'm currently accomplishing this, though I suspect there's a much more efficient way of going about it.
Thanks in advance!
import pandas as pd
Day = [1,2,3,4,5,6,7,8,9,10,11]
DrankCoffee = ['no','no','forgot','yes','no','no','no','no','no','yes','no']
df = pd.DataFrame(list(zip(Day,DrankCoffee)), columns=['Day','DrankCoffee'])
df['Streak'] = 0
s = 0
for (index,row) in df.iterrows():
if row['DrankCoffee'] == 'no':
s += 1
if row['DrankCoffee'] == 'yes':
s = 0
else:
pass
df.loc[index,'Streak'] = s

you can use groupby.transform
for each streak, what you're looking for is something like this:
def my_func(group):
return (group == 'no').cumsum()
you can divide the different streak with simple comparison and cumsum
streak = (df['DrankCoffee'] == 'yes').cumsum()
0 0 1 0 2 0 3 1 4 1 5 1 6 1 7 1 8 1 9 2 10 2
then apply the transform
df['Streak'] = df.groupby(streak)['DrankCoffee'].transform(my_func)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With