Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas calculate time until value in a column is greater than it is in current period

I have a pandas dataframe in python with several columns and a datetime stamp. I want to create a new column, that calculates the time until output is less than what it is in the current period.

My current table looks something like this:

 datetime               output
 2014-05-01 01:00:00    3
 2014-05-01 01:00:01    2
 2014-05-01 01:00:02    3
 2014-05-01 01:00:03    2
 2014-05-01 01:00:04    1

I'm trying to get my table to have an extra column and look like this:

 datetime               output     secondsuntildecrease
 2014-05-01 01:00:00    3         1
 2014-05-01 01:00:01    2         3
 2014-05-01 01:00:02    3         1
 2014-05-01 01:00:03    2         1
 2014-05-01 01:00:04    1         

thanks in advance!

like image 820
Clover Avatar asked Mar 12 '23 12:03

Clover


1 Answers

upper_triangle     = np.triu(df.output.values < df.output.values[:, None])
df['datetime']     = pd.to_datetime(df['datetime'])
df['s_until_dec']  = df['datetime'][upper_triangle.argmax(axis=1)].values - df['datetime']
df.loc[~upper_triangle.any(axis=1), 's_until_dec'] = np.nan
df
             datetime  output           s_until_dec
0 2014-05-01 01:00:00       3              00:00:01
1 2014-05-01 01:00:01       2              00:00:03
2 2014-05-01 01:00:02       3              00:00:01
3 2014-05-01 01:00:03       2              00:00:01
4 2014-05-01 01:00:04       1                   NaT

Here's how it works:

df.output.values < df.output.values[:, None] this creates a pairwise comparison matrix with broadcasting ([:, None] creates a new axis):

df.output.values < df.output.values[:, None]
Out: 
array([[False,  True, False,  True,  True],
       [False, False, False, False,  True],
       [False,  True, False,  True,  True],
       [False, False, False, False,  True],
       [False, False, False, False, False]], dtype=bool)

Here, for example, output[0] is smaller than output[1] so the matrix element for (0, 1) is True. We need the upper triangle so I used np.triu to get the upper triangle of this matrix. argmax() will give me the index of the first True value. If I pass this into iloc, I will get the corresponding date. Except for the last one of course. It has all Falses so I need to replace it with np.nan. .loc part checks that matrix for that case and replaces with np.nan.

like image 148
ayhan Avatar answered Apr 26 '23 19:04

ayhan