Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shift time series with missing dates in Pandas

I have a times series with some missing entries, that looks like this:

date     value
---------------
2000       5
2001      10
2003      8
2004      72
2005      12
2007      13

I would like to do create a column for the "previous_value". But I only want it to show values for consecutive years. So I want it to look like this:

date     value    previous_value
-------------------------------
2000       5        nan
2001      10         5
2003      8         nan
2004      72         8
2005      12        72
2007      13        nan

However just applying pandas shift function directly to the column 'value' would give 'previous_value' = 10 for 'time' = 2003, and 'previous_value' = 12 for 'time' = 2007.

What's the most elegant way to deal with this in pandas? (I'm not sure if it's as easy as setting the 'freq' attribute).

like image 653
user3591836 Avatar asked Mar 11 '15 21:03

user3591836


People also ask

How do I find missing dates in Python?

Using reindex() function to check missing dates Here we are typecasting the string type date into datetime type and with help of reindex() we are checking all the dates that are missing in the given data Frame and assign it to True otherwise assign it to False.


2 Answers

In [588]: df = pd.DataFrame({ 'date':[2000,2001,2003,2004,2005,2007],
                              'value':[5,10,8,72,12,13] })

In [589]: df['previous_value'] = df.value.shift()[ df.date == df.date.shift() + 1 ]

In [590]: df
Out[590]: 
   date  value  previous_value
0  2000      5             NaN
1  2001     10               5
2  2003      8             NaN
3  2004     72               8
4  2005     12              72
5  2007     13             NaN

Also see here for a time series approach using resample(): Using shift() with unevenly spaced data

like image 119
JohnE Avatar answered Sep 25 '22 20:09

JohnE


Your example doesn't look like real time series data with timestamps. Let's take another example with the missing date 2020-01-03:

df = pd.DataFrame({"val": [10, 20, 30, 40, 50]},
                  index=pd.date_range("2020-01-01", "2020-01-05"))
df.drop(pd.Timestamp('2020-01-03'), inplace=True)

            val
2020-01-01   10
2020-01-02   20
2020-01-04   40
2020-01-05   50

To shift by one day you can set the freq parameter to 'D':

df.shift(1, freq='D')

Output:

            val
2020-01-02   10
2020-01-03   20
2020-01-05   40
2020-01-06   50

To combine original data with the shifted one you can merge both tables:

df.merge(df.shift(1, freq='D'),
         left_index=True,
         right_index=True,
         how='left',
         suffixes=('', '_previous'))

Output:

            val  val_previous
2020-01-01   10           NaN
2020-01-02   20          10.0
2020-01-04   40           NaN
2020-01-05   50          40.0

Other offset aliases you can find here

like image 40
Mykola Zotko Avatar answered Sep 23 '22 20:09

Mykola Zotko