Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

get previous row's value and calculate new column pandas python

Tags:

python

pandas

Is there a way to look back to a previous row, and calculate a new variable? so as long as the previous row is the same case what is the (previous change) - (current change), and attribute it to the previous 'ChangeEvent' in new columns?

here is my DataFrame

>>> df   ChangeEvent StartEvent  case              change      open   0    Homeless   Homeless     1 2014-03-08 00:00:00 2014-02-08   1       other   Homeless     1 2014-04-08 00:00:00 2014-02-08      2    Homeless   Homeless     1 2014-05-08 00:00:00 2014-02-08       3        Jail   Homeless     1 2014-06-08 00:00:00 2014-02-08      4        Jail       Jail     2 2014-06-08 00:00:00 2014-02-08    

to add columns

Jail  Homeless case  0    6        1  0    30       1  0    0        1 

... and so on

here is the df build

import pandas as pd import datetime as DT d = {'case' : pd.Series([1,1,1,1,2]), 'open' : pd.Series([DT.datetime(2014, 3, 2), DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2)]), 'change' : pd.Series([DT.datetime(2014, 3, 8), DT.datetime(2014, 4, 8),DT.datetime(2014, 5, 8),DT.datetime(2014, 6, 8),DT.datetime(2014, 6, 8)]), 'StartEvent' : pd.Series(['Homeless','Homeless','Homeless','Homeless','Jail']), 'ChangeEvent' : pd.Series(['Homeless','irrelivant','Homeless','Jail','Jail']), 'close' : pd.Series([DT.datetime(2015, 3, 2), DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2)])} df=pd.DataFrame(d) 
like image 418
Chet Meinzer Avatar asked Feb 27 '14 22:02

Chet Meinzer


People also ask

How do I get a column value of a pandas DataFrame based on another column?

You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression.

How do you do cumulative sum in pandas?

The cumsum() method returns a DataFrame with the cumulative sum for each row. The cumsum() method goes through the values in the DataFrame, from the top, row by row, adding the values with the value from the previous row, ending up with a DataFrame where the last row contains the sum of all values for each column.


1 Answers

The way to get the previous is using the shift method:

In [11]: df1.change.shift(1) Out[11]: 0          NaT 1   2014-03-08 2   2014-04-08 3   2014-05-08 4   2014-06-08 Name: change, dtype: datetime64[ns] 

Now you can subtract these columns. Note: This is with 0.13.1 (datetime stuff has had a lot of work recently, so YMMV with older versions).

In [12]: df1.change.shift(1) - df1.change Out[12]: 0        NaT 1   -31 days 2   -30 days 3   -31 days 4     0 days Name: change, dtype: timedelta64[ns] 

You can just apply this to each case/group:

In [13]: df.groupby('case')['change'].apply(lambda x: x.shift(1) - x) Out[13]: 0        NaT 1   -31 days 2   -30 days 3   -31 days 4        NaT dtype: timedelta64[ns] 
like image 130
Andy Hayden Avatar answered Oct 15 '22 15:10

Andy Hayden