I have a times series with some missing entries, that looks like this:
date     value
---------------
2000       5
2001      10
2003      8
2004      72
2005      12
2007      13
I would like to do create a column for the "previous_value". But I only want it to show values for consecutive years. So I want it to look like this:
date     value    previous_value
-------------------------------
2000       5        nan
2001      10         5
2003      8         nan
2004      72         8
2005      12        72
2007      13        nan
However just applying pandas shift function directly to the column 'value' would give 'previous_value' = 10 for 'time' = 2003, and 'previous_value' = 12 for 'time' = 2007.
What's the most elegant way to deal with this in pandas? (I'm not sure if it's as easy as setting the 'freq' attribute).
Using reindex() function to check missing dates Here we are typecasting the string type date into datetime type and with help of reindex() we are checking all the dates that are missing in the given data Frame and assign it to True otherwise assign it to False.
In [588]: df = pd.DataFrame({ 'date':[2000,2001,2003,2004,2005,2007],
                              'value':[5,10,8,72,12,13] })
In [589]: df['previous_value'] = df.value.shift()[ df.date == df.date.shift() + 1 ]
In [590]: df
Out[590]: 
   date  value  previous_value
0  2000      5             NaN
1  2001     10               5
2  2003      8             NaN
3  2004     72               8
4  2005     12              72
5  2007     13             NaN
Also see here for a time series approach using resample():  Using shift() with unevenly spaced data
Your example doesn't look like real time series data with timestamps. Let's take another example with the missing date 2020-01-03:
df = pd.DataFrame({"val": [10, 20, 30, 40, 50]},
                  index=pd.date_range("2020-01-01", "2020-01-05"))
df.drop(pd.Timestamp('2020-01-03'), inplace=True)
            val
2020-01-01   10
2020-01-02   20
2020-01-04   40
2020-01-05   50
To shift by one day you can set the freq parameter to 'D':
df.shift(1, freq='D')
Output:
            val
2020-01-02   10
2020-01-03   20
2020-01-05   40
2020-01-06   50
To combine original data with the shifted one you can merge both tables:
df.merge(df.shift(1, freq='D'),
         left_index=True,
         right_index=True,
         how='left',
         suffixes=('', '_previous'))
Output:
            val  val_previous
2020-01-01   10           NaN
2020-01-02   20          10.0
2020-01-04   40           NaN
2020-01-05   50          40.0
Other offset aliases you can find here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With