Say I have the following DataFrame which has a 0/1 entry depending on whether something happened/didn't happen within a certain month. <pre class="prettyprint"><code>Y = [0,0,1,1,0,0,0,0,1,1,1] X = pd.date_range(start = "2010", freq = "MS", periods = len(Y)) df = pd.DataFrame({'R': Y},index = X) R 2010-01-01 0 2010-02-01 0 2010-03-01 1 2010-04-01 1 2010-05-01 0 2010-06-01 0 2010-07-01 0 2010-08-01 0 2010-09-01 1 2010-10-01 1 2010-11-01 1 </code></pre> What I want is to create a 2nd column that lists the # of months until the next occurrence of a 1. That is, I need: <pre class="prettyprint"><code> R F 2010-01-01 0 2 2010-02-01 0 1 2010-03-01 1 0 2010-04-01 1 0 2010-05-01 0 4 2010-06-01 0 3 2010-07-01 0 2 2010-08-01 0 1 2010-09-01 1 0 2010-10-01 1 0 2010-11-01 1 0 </code></pre> What I've tried: I haven't gotten far, but I'm able to fill the first bit <pre class="prettyprint"><code>A = list(df.index) T = df[df['R']==1] a = df.index[0] b = T.index[0] c = A.index(b) - A.index(a) df.loc[a:b, 'F'] = np.linspace(c,0,c+1) R F 2010-01-01 0 2.0 2010-02-01 0 1.0 2010-03-01 1 0.0 2010-04-01 1 NaN 2010-05-01 0 NaN 2010-06-01 0 NaN 2010-07-01 0 NaN 2010-08-01 0 NaN 2010-09-01 1 NaN 2010-10-01 1 NaN 2010-11-01 1 NaN </code></pre> EDIT Probably would have been better to provide an original example that spanned multiple years. <pre class="prettyprint"><code>Y = [0,0,1,1,0,0,0,0,1,1,1,0,0,1,1,1,0,1,1,1] X = pd.date_range(start = "2010", freq = "MS", periods = len(Y)) df = pd.DataFrame({'R': Y},index = X) </code></pre>

Here is a way that worked for me, not as elegant as @user3483203 but it does the job. <pre class="prettyprint"><code>df['F'] = 0 for i in df.index: j = i while df.loc[j, 'R'] == 0: df.loc[i, 'F'] =df.loc[i, 'F'] + 1 j=j+1 df </code></pre> <pre class="prettyprint"><code>################ Out[39]: index R F 0 2010-01-01 0 2 1 2010-02-01 0 1 2 2010-03-01 1 0 3 2010-04-01 1 0 4 2010-05-01 0 4 5 2010-06-01 0 3 6 2010-07-01 0 2 7 2010-08-01 0 1 8 2010-09-01 1 0 9 2010-10-01 1 0 10 2010-11-01 1 0 In [40]: </code></pre>

My take <pre class="prettyprint"><code>s = (df.R.diff().ne(0) | df.R.eq(1)).cumsum() s.groupby(s).transform(lambda s: np.arange(len(s),0,-1) if len(s)>1 else 0) </code></pre> <hr> <pre class="prettyprint"><code>2010-01-01 2 2010-02-01 1 2010-03-01 0 2010-04-01 0 2010-05-01 4 2010-06-01 3 2010-07-01 2 2010-08-01 1 2010-09-01 0 2010-10-01 0 2010-11-01 0 Freq: MS, Name: R, dtype: int64 </code></pre>

Pandas: fill one column with count of # of obs between occurrences in a 2nd column

Tags:

python

pandas

Say I have the following DataFrame which has a 0/1 entry depending on whether something happened/didn't happen within a certain month.

Click to copy

Y = [0,0,1,1,0,0,0,0,1,1,1]
X = pd.date_range(start = "2010", freq = "MS", periods = len(Y))

df = pd.DataFrame({'R': Y},index = X)



            R
2010-01-01  0
2010-02-01  0
2010-03-01  1
2010-04-01  1
2010-05-01  0
2010-06-01  0
2010-07-01  0
2010-08-01  0
2010-09-01  1
2010-10-01  1
2010-11-01  1

What I want is to create a 2nd column that lists the # of months until the next occurrence of a 1.

That is, I need:

Click to copy

            R  F
2010-01-01  0  2
2010-02-01  0  1
2010-03-01  1  0
2010-04-01  1  0
2010-05-01  0  4
2010-06-01  0  3
2010-07-01  0  2
2010-08-01  0  1
2010-09-01  1  0
2010-10-01  1  0
2010-11-01  1  0

What I've tried: I haven't gotten far, but I'm able to fill the first bit

Click to copy

A = list(df.index)
T = df[df['R']==1]

a = df.index[0]
b = T.index[0]
c = A.index(b) - A.index(a)

df.loc[a:b, 'F'] = np.linspace(c,0,c+1)

            R    F
2010-01-01  0  2.0
2010-02-01  0  1.0
2010-03-01  1  0.0
2010-04-01  1  NaN
2010-05-01  0  NaN
2010-06-01  0  NaN
2010-07-01  0  NaN
2010-08-01  0  NaN
2010-09-01  1  NaN
2010-10-01  1  NaN
2010-11-01  1  NaN

EDIT Probably would have been better to provide an original example that spanned multiple years.

Click to copy

Y = [0,0,1,1,0,0,0,0,1,1,1,0,0,1,1,1,0,1,1,1]
X = pd.date_range(start = "2010", freq = "MS", periods = len(Y))

df = pd.DataFrame({'R': Y},index = X)

731

asked Aug 09 '19 13:08

measure_theory

4 Answers

Here is my way

Click to copy

s=df.R.cumsum()
df.loc[df.R==0,'F']=s.groupby(s).cumcount(ascending=False)+1
df.F.fillna(0,inplace=True)

df
Out[12]: 
            R    F
2010-01-01  0  2.0
2010-02-01  0  1.0
2010-03-01  1  0.0
2010-04-01  1  0.0
2010-05-01  0  4.0
2010-06-01  0  3.0
2010-07-01  0  2.0
2010-08-01  0  1.0
2010-09-01  1  0.0
2010-10-01  1  0.0
2010-11-01  1  0.0

174

answered Oct 05 '22 13:10

BENY

Create a series containing your dates, mask this series when your R series is not equal to 1, bfill, and subtract!

Click to copy

u = df.index.to_series()

ii = u.where(df.R.eq(1)).bfill()

12 * (ii.dt.year - u.dt.year) + (ii.dt.month - u.dt.month)

Click to copy

2010-01-01    2
2010-02-01    1
2010-03-01    0
2010-04-01    0
2010-05-01    4
2010-06-01    3
2010-07-01    2
2010-08-01    1
2010-09-01    0
2010-10-01    0
2010-11-01    0
Freq: MS, dtype: int64

answered Oct 05 '22 14:10

user3483203

Here is a way that worked for me, not as elegant as @user3483203 but it does the job.

Click to copy

df['F'] = 0 
for i in df.index: 
     j = i 
     while df.loc[j, 'R'] == 0: 
         df.loc[i, 'F'] =df.loc[i, 'F'] + 1 
         j=j+1                                                                                                                      
df

Click to copy

################
Out[39]: 
        index  R  F
0  2010-01-01  0  2
1  2010-02-01  0  1
2  2010-03-01  1  0
3  2010-04-01  1  0
4  2010-05-01  0  4
5  2010-06-01  0  3
6  2010-07-01  0  2
7  2010-08-01  0  1
8  2010-09-01  1  0
9  2010-10-01  1  0
10 2010-11-01  1  0

In [40]:

answered Oct 05 '22 12:10

nidabdella

My take

Click to copy

s = (df.R.diff().ne(0) | df.R.eq(1)).cumsum()
s.groupby(s).transform(lambda s: np.arange(len(s),0,-1) if len(s)>1 else 0)

Click to copy

2010-01-01    2
2010-02-01    1
2010-03-01    0
2010-04-01    0
2010-05-01    4
2010-06-01    3
2010-07-01    2
2010-08-01    1
2010-09-01    0
2010-10-01    0
2010-11-01    0
Freq: MS, Name: R, dtype: int64

answered Oct 05 '22 13:10

rafaelc

Related questions
                            
                                How to capitalize first letter in strings that may contain numbers
                            
                                How to get slope from timeseries data in pandas?
                            
                                Legend with vertical line in matplotlib
                            
                                Installed pytest but running `pytest` in bash returns `not found`
                            
                                How can I select specific fields in django rest framework? [duplicate]
                            
                                MultiThreading in AWS lambda using Python3
                            
                                Compiling cython with gcc: No such file or directory from #include "ios"
                            
                                Is it possible to use spacy with already tokenized input?
                            
                                Using a variable within a regular expression in Pandas str.contains()
                            
                                Why are bitwise operators slower than multiplication/division/modulo?
                            
                                PIP3 list failed completely and returing error
                            
                                Classification metrics can't handle a mix of binary and continuous targets [duplicate]
                            
                                Spyder reports invalid alias when running any script
                            
                                Paramiko/cryptography deprecation warnings: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers [duplicate]
                            
                                Unexpected result: pytest.raises with match parameter when asserting error
                            
                                How to add elements in list which is value of dictionary and those elements not be repeated as another keys of that dictionary?
                            
                                Can geopandas get a geopackage's (or other vector file) all layers?
                            
                                Remove the rows from pandas dataframe, that has sentences longer than certain word length
                            
                                tensor.numpy() not working in tensorflow.data.Dataset. Throws the error: AttributeError: 'Tensor' object has no attribute 'numpy'
                            
                                Board game: Find maximum green points with restricted red points

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: fill one column with count of # of obs between occurrences in a 2nd column

Tags:

python

pandas

measure_theory

People also ask

4 Answers

BENY

user3483203

nidabdella

rafaelc

Recent Activity

Donate For Us