I have a times series with some missing entries, that looks like this: <pre class="prettyprint"><code>date value --------------- 2000 5 2001 10 2003 8 2004 72 2005 12 2007 13 </code></pre> I would like to do create a column for the "previous_value". But I only want it to show values for consecutive years. So I want it to look like this: <pre class="prettyprint"><code>date value previous_value ------------------------------- 2000 5 nan 2001 10 5 2003 8 nan 2004 72 8 2005 12 72 2007 13 nan </code></pre> However just applying pandas shift function directly to the column 'value' would give 'previous_value' = 10 for 'time' = 2003, and 'previous_value' = 12 for 'time' = 2007. What's the most elegant way to deal with this in pandas? (I'm not sure if it's as easy as setting the 'freq' attribute).

Your example doesn't look like real time series data with timestamps. Let's take another example with the missing date <code>2020-01-03</code>: <pre class="prettyprint"><code>df = pd.DataFrame({"val": [10, 20, 30, 40, 50]}, index=pd.date_range("2020-01-01", "2020-01-05")) df.drop(pd.Timestamp('2020-01-03'), inplace=True) val 2020-01-01 10 2020-01-02 20 2020-01-04 40 2020-01-05 50 </code></pre> To shift by one day you can set the <code>freq</code> parameter to 'D': <pre class="prettyprint"><code>df.shift(1, freq='D') </code></pre> Output: <pre class="prettyprint"><code> val 2020-01-02 10 2020-01-03 20 2020-01-05 40 2020-01-06 50 </code></pre> To combine original data with the shifted one you can merge both tables: <pre class="prettyprint"><code>df.merge(df.shift(1, freq='D'), left_index=True, right_index=True, how='left', suffixes=('', '_previous')) </code></pre> Output: <pre class="prettyprint"><code> val val_previous 2020-01-01 10 NaN 2020-01-02 20 10.0 2020-01-04 40 NaN 2020-01-05 50 40.0 </code></pre> Other offset aliases you can find here

Shift time series with missing dates in Pandas

Tags:

python

pandas

time-series

shift

I have a times series with some missing entries, that looks like this:

date     value
---------------
2000       5
2001      10
2003      8
2004      72
2005      12
2007      13

I would like to do create a column for the "previous_value". But I only want it to show values for consecutive years. So I want it to look like this:

date     value    previous_value
-------------------------------
2000       5        nan
2001      10         5
2003      8         nan
2004      72         8
2005      12        72
2007      13        nan

However just applying pandas shift function directly to the column 'value' would give 'previous_value' = 10 for 'time' = 2003, and 'previous_value' = 12 for 'time' = 2007.

What's the most elegant way to deal with this in pandas? (I'm not sure if it's as easy as setting the 'freq' attribute).

653

asked Mar 11 '15 21:03

user3591836

2 Answers

In [588]: df = pd.DataFrame({ 'date':[2000,2001,2003,2004,2005,2007],
                              'value':[5,10,8,72,12,13] })

In [589]: df['previous_value'] = df.value.shift()[ df.date == df.date.shift() + 1 ]

In [590]: df
Out[590]: 
   date  value  previous_value
0  2000      5             NaN
1  2001     10               5
2  2003      8             NaN
3  2004     72               8
4  2005     12              72
5  2007     13             NaN

Also see here for a time series approach using resample(): Using shift() with unevenly spaced data

119

answered Sep 25 '22 20:09

JohnE

Your example doesn't look like real time series data with timestamps. Let's take another example with the missing date 2020-01-03:

df = pd.DataFrame({"val": [10, 20, 30, 40, 50]},
                  index=pd.date_range("2020-01-01", "2020-01-05"))
df.drop(pd.Timestamp('2020-01-03'), inplace=True)

            val
2020-01-01   10
2020-01-02   20
2020-01-04   40
2020-01-05   50

To shift by one day you can set the freq parameter to 'D':

df.shift(1, freq='D')

Output:

            val
2020-01-02   10
2020-01-03   20
2020-01-05   40
2020-01-06   50

To combine original data with the shifted one you can merge both tables:

df.merge(df.shift(1, freq='D'),
         left_index=True,
         right_index=True,
         how='left',
         suffixes=('', '_previous'))

Output:

            val  val_previous
2020-01-01   10           NaN
2020-01-02   20          10.0
2020-01-04   40           NaN
2020-01-05   50          40.0

Other offset aliases you can find here

answered Sep 23 '22 20:09

Mykola Zotko

Related questions
                            
                                How to prevent QTableView item from getting cleared on double-click
                            
                                Python: Focus on ttk.Notebook tabs
                            
                                Python : issue running XLRD
                            
                                Pandas + scikit-learn K-means not working properly - treats all of dataframe rows as one big multi-dimensional example
                            
                                How does one get an Enum's members into the global namespace?
                            
                                ValueError: readline of closed file in Python
                            
                                Python subclassing a class with custom __new__
                            
                                How to enable debug mode as an option using Python's logging module
                            
                                Serialize a string with pyyaml without an ellipsis
                            
                                How do I use a constant LETTER in sympy?
                            
                                Can't import pandas into pycharm interpreter, despite changing pyCharm python interpreter path
                            
                                Trailing spaces removed on Python heredoc lines in PyCharm
                            
                                Python subclassing process with parameter
                            
                                Print original exception in excepthook
                            
                                How to pause Multiprocessing Process in Python?
                            
                                How do I escape colons in an attribute name with Python's ElementTree?
                            
                                Python String Templating with Case Sensitivity
                            
                                Python Beginner - where comes <bound method ... of <... object at 0x0000000005EAAEB8>> from?
                            
                                Does asyncio support running a subprocess from a non-main thread?
                            
                                Django admin list display optimize queryset

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With