Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas: Create Column That Acts As A Conditional Running Variable

I'm trying to create a new dataframe column that acts as a running variable that resets to zero or "passes" under certain conditions. Below is a simplified example of what I'm looking to accomplish. Let's say I'm trying to quit drinking coffee and I'm tracking the number of days in a row i've gone without drinking any. On days where I forgot to make note of whether I drank coffee, I put "forgot", and my tally does not get influenced.

Below is how i'm currently accomplishing this, though I suspect there's a much more efficient way of going about it.

Thanks in advance!

import pandas as pd

Day = [1,2,3,4,5,6,7,8,9,10,11]  
DrankCoffee = ['no','no','forgot','yes','no','no','no','no','no','yes','no']

df = pd.DataFrame(list(zip(Day,DrankCoffee)), columns=['Day','DrankCoffee'])

df['Streak'] = 0  

s = 0

for (index,row) in df.iterrows():
   if row['DrankCoffee'] == 'no':
      s += 1
   if row['DrankCoffee'] == 'yes':
      s = 0
   else:
      pass

   df.loc[index,'Streak'] = s

enter image description here

like image 327
crowsnest Avatar asked Apr 25 '26 02:04

crowsnest


1 Answers

you can use groupby.transform

for each streak, what you're looking for is something like this:

def my_func(group):
    return (group == 'no').cumsum()

you can divide the different streak with simple comparison and cumsum

streak = (df['DrankCoffee'] == 'yes').cumsum()
0     0
1     0
2     0
3     1
4     1
5     1
6     1
7     1
8     1
9     2
10    2

then apply the transform

df['Streak'] = df.groupby(streak)['DrankCoffee'].transform(my_func)
like image 110
Maarten Fabré Avatar answered Apr 27 '26 15:04

Maarten Fabré



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!