I have a times series with some missing entries, that looks like this:
date value
---------------
2000 5
2001 10
2003 8
2004 72
2005 12
2007 13
I would like to do create a column for the "previous_value". But I only want it to show values for consecutive years. So I want it to look like this:
date value previous_value
-------------------------------
2000 5 nan
2001 10 5
2003 8 nan
2004 72 8
2005 12 72
2007 13 nan
However just applying pandas shift function directly to the column 'value' would give 'previous_value' = 10 for 'time' = 2003, and 'previous_value' = 12 for 'time' = 2007.
What's the most elegant way to deal with this in pandas? (I'm not sure if it's as easy as setting the 'freq' attribute).
Using reindex() function to check missing dates Here we are typecasting the string type date into datetime type and with help of reindex() we are checking all the dates that are missing in the given data Frame and assign it to True otherwise assign it to False.
In [588]: df = pd.DataFrame({ 'date':[2000,2001,2003,2004,2005,2007],
'value':[5,10,8,72,12,13] })
In [589]: df['previous_value'] = df.value.shift()[ df.date == df.date.shift() + 1 ]
In [590]: df
Out[590]:
date value previous_value
0 2000 5 NaN
1 2001 10 5
2 2003 8 NaN
3 2004 72 8
4 2005 12 72
5 2007 13 NaN
Also see here for a time series approach using resample()
: Using shift() with unevenly spaced data
Your example doesn't look like real time series data with timestamps. Let's take another example with the missing date 2020-01-03
:
df = pd.DataFrame({"val": [10, 20, 30, 40, 50]},
index=pd.date_range("2020-01-01", "2020-01-05"))
df.drop(pd.Timestamp('2020-01-03'), inplace=True)
val
2020-01-01 10
2020-01-02 20
2020-01-04 40
2020-01-05 50
To shift by one day you can set the freq
parameter to 'D':
df.shift(1, freq='D')
Output:
val
2020-01-02 10
2020-01-03 20
2020-01-05 40
2020-01-06 50
To combine original data with the shifted one you can merge both tables:
df.merge(df.shift(1, freq='D'),
left_index=True,
right_index=True,
how='left',
suffixes=('', '_previous'))
Output:
val val_previous
2020-01-01 10 NaN
2020-01-02 20 10.0
2020-01-04 40 NaN
2020-01-05 50 40.0
Other offset aliases you can find here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With