Is there a way to look back to a previous row, and calculate a new variable? so as long as the previous row is the same case what is the (previous change) - (current change), and attribute it to the previous 'ChangeEvent' in new columns?
here is my DataFrame
>>> df ChangeEvent StartEvent case change open 0 Homeless Homeless 1 2014-03-08 00:00:00 2014-02-08 1 other Homeless 1 2014-04-08 00:00:00 2014-02-08 2 Homeless Homeless 1 2014-05-08 00:00:00 2014-02-08 3 Jail Homeless 1 2014-06-08 00:00:00 2014-02-08 4 Jail Jail 2 2014-06-08 00:00:00 2014-02-08
to add columns
Jail Homeless case 0 6 1 0 30 1 0 0 1
... and so on
here is the df build
import pandas as pd import datetime as DT d = {'case' : pd.Series([1,1,1,1,2]), 'open' : pd.Series([DT.datetime(2014, 3, 2), DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2)]), 'change' : pd.Series([DT.datetime(2014, 3, 8), DT.datetime(2014, 4, 8),DT.datetime(2014, 5, 8),DT.datetime(2014, 6, 8),DT.datetime(2014, 6, 8)]), 'StartEvent' : pd.Series(['Homeless','Homeless','Homeless','Homeless','Jail']), 'ChangeEvent' : pd.Series(['Homeless','irrelivant','Homeless','Jail','Jail']), 'close' : pd.Series([DT.datetime(2015, 3, 2), DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2)])} df=pd.DataFrame(d)
You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression.
The cumsum() method returns a DataFrame with the cumulative sum for each row. The cumsum() method goes through the values in the DataFrame, from the top, row by row, adding the values with the value from the previous row, ending up with a DataFrame where the last row contains the sum of all values for each column.
The way to get the previous is using the shift method:
In [11]: df1.change.shift(1) Out[11]: 0 NaT 1 2014-03-08 2 2014-04-08 3 2014-05-08 4 2014-06-08 Name: change, dtype: datetime64[ns]
Now you can subtract these columns. Note: This is with 0.13.1 (datetime stuff has had a lot of work recently, so YMMV with older versions).
In [12]: df1.change.shift(1) - df1.change Out[12]: 0 NaT 1 -31 days 2 -30 days 3 -31 days 4 0 days Name: change, dtype: timedelta64[ns]
You can just apply this to each case/group:
In [13]: df.groupby('case')['change'].apply(lambda x: x.shift(1) - x) Out[13]: 0 NaT 1 -31 days 2 -30 days 3 -31 days 4 NaT dtype: timedelta64[ns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With