Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Incrementing a counter based on some value with np.where

I'm trying to get a counter to increase as I process an Pandas series using np.where based on some time differencesin days. For example, if I have a series with the following values:

Date        Value
01/03/2017  5
02/03/2017  8
03/03/2017  3
04/03/2017  7
12/03/2017  1
13/03/2017  3
14/03/2017  4

I will go to a series that looks like this through this code

df['DIFF'] = df['Date'].diff()/np.timedelta64(1, 'D')

To produce this dataframe.

Date        Value  DIFF
01/03/2017  5      0
02/03/2017  8      1
03/03/2017  3      1
04/03/2017  7      1
12/03/2017  1      8
13/03/2017  3      1
14/03/2017  4      1

I then want to create a lifetime that counts the number of lives, on the assumption that a time diff greater than say 4 would be a new instance of a lifetime.

Date        Value  DIFF   LIFETIME
01/03/2017  5      0      1
02/03/2017  8      1      1
03/03/2017  3      1      1
04/03/2017  7      1      1
12/03/2017  1      8      2
13/03/2017  3      1      2
14/03/2017  4      1      2

I think I'm almost there with this code

df['LIFE'] = np.where(df['DIFF'] >=4, life_counter=df.shift(-1)+1, df.shift(-1))

The logic here would be that if the DIFF is greater than or equal 4, the I would set the LIFE variable to the previous + 1. Otherwise, it would be the same as the previous value. It seemed a neat way to carry state. However, my loop appears to ignore the state of what I've set, probably due to how np.where works. Does someone know a way to do what I'm doing and have it working. At present, my output looks like this.

Date        Value  DIFF   LIFETIME
01/03/2017  5      0      1
02/03/2017  8      1      1
03/03/2017  3      1      1
04/03/2017  7      1      1
12/03/2017  1      8      2
13/03/2017  3      1      1
14/03/2017  4      1      1
like image 968
Eamonn Avatar asked Nov 30 '25 06:11

Eamonn


1 Answers

I believe you simply want a cumulative sum on a boolean array, augmented by 1:

>>> df
         Date  Value  DIFF
0  01/03/2017      5     0
1  02/03/2017      8     1
2  03/03/2017      3     1
3  04/03/2017      7     1
4  12/03/2017      1     8
5  13/03/2017      3     1
6  14/03/2017      4     1
>>> df['LIFETIME'] = np.cumsum(df.DIFF >= 4) + 1
>>> df
         Date  Value  DIFF  LIFETIME
0  01/03/2017      5     0         1
1  02/03/2017      8     1         1
2  03/03/2017      3     1         1
3  04/03/2017      7     1         1
4  12/03/2017      1     8         2
5  13/03/2017      3     1         2
6  14/03/2017      4     1         2
like image 166
juanpa.arrivillaga Avatar answered Dec 02 '25 18:12

juanpa.arrivillaga