Say I have the following DataFrame which has a 0/1 entry depending on whether something happened/didn't happen within a certain month.
Y = [0,0,1,1,0,0,0,0,1,1,1]
X = pd.date_range(start = "2010", freq = "MS", periods = len(Y))
df = pd.DataFrame({'R': Y},index = X)
R
2010-01-01 0
2010-02-01 0
2010-03-01 1
2010-04-01 1
2010-05-01 0
2010-06-01 0
2010-07-01 0
2010-08-01 0
2010-09-01 1
2010-10-01 1
2010-11-01 1
What I want is to create a 2nd column that lists the # of months until the next occurrence of a 1.
That is, I need:
R F
2010-01-01 0 2
2010-02-01 0 1
2010-03-01 1 0
2010-04-01 1 0
2010-05-01 0 4
2010-06-01 0 3
2010-07-01 0 2
2010-08-01 0 1
2010-09-01 1 0
2010-10-01 1 0
2010-11-01 1 0
What I've tried: I haven't gotten far, but I'm able to fill the first bit
A = list(df.index)
T = df[df['R']==1]
a = df.index[0]
b = T.index[0]
c = A.index(b) - A.index(a)
df.loc[a:b, 'F'] = np.linspace(c,0,c+1)
R F
2010-01-01 0 2.0
2010-02-01 0 1.0
2010-03-01 1 0.0
2010-04-01 1 NaN
2010-05-01 0 NaN
2010-06-01 0 NaN
2010-07-01 0 NaN
2010-08-01 0 NaN
2010-09-01 1 NaN
2010-10-01 1 NaN
2010-11-01 1 NaN
EDIT Probably would have been better to provide an original example that spanned multiple years.
Y = [0,0,1,1,0,0,0,0,1,1,1,0,0,1,1,1,0,1,1,1]
X = pd.date_range(start = "2010", freq = "MS", periods = len(Y))
df = pd.DataFrame({'R': Y},index = X)
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
To count the number of occurrences in e.g. a column in a dataframe you can use Pandas value_counts() method. For example, if you type df['condition']. value_counts() you will get the frequency of each unique value in the column “condition”.
Here is my way
s=df.R.cumsum()
df.loc[df.R==0,'F']=s.groupby(s).cumcount(ascending=False)+1
df.F.fillna(0,inplace=True)
df
Out[12]:
R F
2010-01-01 0 2.0
2010-02-01 0 1.0
2010-03-01 1 0.0
2010-04-01 1 0.0
2010-05-01 0 4.0
2010-06-01 0 3.0
2010-07-01 0 2.0
2010-08-01 0 1.0
2010-09-01 1 0.0
2010-10-01 1 0.0
2010-11-01 1 0.0
Create a series containing your dates, mask this series when your R
series is not equal to 1
, bfill
, and subtract!
u = df.index.to_series()
ii = u.where(df.R.eq(1)).bfill()
12 * (ii.dt.year - u.dt.year) + (ii.dt.month - u.dt.month)
2010-01-01 2
2010-02-01 1
2010-03-01 0
2010-04-01 0
2010-05-01 4
2010-06-01 3
2010-07-01 2
2010-08-01 1
2010-09-01 0
2010-10-01 0
2010-11-01 0
Freq: MS, dtype: int64
Here is a way that worked for me, not as elegant as @user3483203 but it does the job.
df['F'] = 0
for i in df.index:
j = i
while df.loc[j, 'R'] == 0:
df.loc[i, 'F'] =df.loc[i, 'F'] + 1
j=j+1
df
################
Out[39]:
index R F
0 2010-01-01 0 2
1 2010-02-01 0 1
2 2010-03-01 1 0
3 2010-04-01 1 0
4 2010-05-01 0 4
5 2010-06-01 0 3
6 2010-07-01 0 2
7 2010-08-01 0 1
8 2010-09-01 1 0
9 2010-10-01 1 0
10 2010-11-01 1 0
In [40]:
My take
s = (df.R.diff().ne(0) | df.R.eq(1)).cumsum()
s.groupby(s).transform(lambda s: np.arange(len(s),0,-1) if len(s)>1 else 0)
2010-01-01 2
2010-02-01 1
2010-03-01 0
2010-04-01 0
2010-05-01 4
2010-06-01 3
2010-07-01 2
2010-08-01 1
2010-09-01 0
2010-10-01 0
2010-11-01 0
Freq: MS, Name: R, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With