Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Incremental Counter flag for a matching condition on subsequent time series data

Tags:

python

pandas

I have a dataframe that looks like below

ID      DATE          PROFIT
2342  2017-03-01       457
2342  2017-06-01       658
2342  2017-09-01       3456
2342  2017-12-01       345
2342  2018-03-01       235
2342  2018-06-01       23
808   2016-12-01       200        
808   2017-03-01       9346
808   2017-06-01       54
808   2017-09-01       314
808   2017-12-01       57
....
....

For each ID:

I want to find out if the Profit has stayed between 200 and 1000. I want to do it in such a way that a counter( a new column) indicates how many quarters (latest and previous) in succession have satisfied this condition. If for some reason, one of the intermediate quarters does not match the condition, the counter should reset.

So the output should look something like :

ID      DATE          PROFIT    COUNTER
2342  2017-03-01       457        1
2342  2017-06-01       658        2
2342  2017-09-01       3456       0
2342  2017-12-01       345        1
2342  2018-03-01       235        2
2342  2018-06-01       23         0
808   2016-12-01       200        1
808   2017-03-01       9346       0
808   2017-06-01       54         0
808   2017-09-01       314        1
808   2017-12-01       57         0
....
....

I am thinking of using the shift functionality to access/condition on the previous rows, however if there is a better way to check if condition in datetime values, it will be good to know.

like image 626
asimo Avatar asked May 01 '19 00:05

asimo


2 Answers

IIUC Create the help key by using cumsum , then we just need to filter before assign back and fillna which is not between 200 to 1000 as 0

s=(~df.PROFIT.between(200,1000)).groupby(df['ID']).cumsum()
df['COUNTER']=df[df.PROFIT.between(200,1000)].groupby([df.ID,s]).cumcount()+1
df.COUNTER.fillna(0,inplace=True)
df
Out[226]: 
      ID        DATE  PROFIT  COUNTER
0   2342  2017-03-01     457      1.0
1   2342  2017-06-01     658      2.0
2   2342  2017-09-01    3456      0.0
3   2342  2017-12-01     345      1.0
4   2342  2018-03-01     235      2.0
5   2342  2018-06-01      23      0.0
6    808  2016-12-01     200      1.0
7    808  2017-03-01    9346      0.0
8    808  2017-06-01      54      0.0
9    808  2017-09-01     314      1.0
10   808  2017-12-01      57      0.0
like image 85
BENY Avatar answered Nov 19 '22 19:11

BENY


Set up a criteria column with value 1 meets criteria, then group and sum.

df['criteria'] = 0

df.loc[(df['PROFIT'] >= 200) & (df['PROFIT'] <= 1000), 'criteria'] = 1

df['result'] = df.groupby(['ID', df.criteria.eq(0).cumsum()])['criteria'].cumsum()


     ID        DATE  PROFIT  criteria  result
0   2342  2017-03-01     457         1       1
1   2342  2017-06-01     658         1       2
2   2342  2017-09-01    3456         0       0
3   2342  2017-12-01     345         1       1
4   2342  2018-03-01     235         1       2
5   2342  2018-06-01      23         0       0
6    808  2016-12-01     200         1       1
7    808  2017-03-01    9346         0       0
8    808  2017-06-01      54         0       0
9    808  2017-09-01     314         1       1
10   808  2017-12-01      57         0       0
like image 27
run-out Avatar answered Nov 19 '22 21:11

run-out