Resample in a rolling window using pandas

Tags:

Assume I have daily data (not regularly spaced), I want to compute for each month the moving standard deviation (or an arbitrarily non linear function) in the past 5 months. For example, for May 2012 I would compute the stddev from the period starting from Jan 2012 to May 2012 (5 months). For June 2012 the period starts in Feb 2012, etc. The final result is a time series with monthly values.

I cannot apply a rolling window because this would first be daily and secondly I need to specify the number of values (a rolling window does not aggregate by time frame, some posts addressed this issue but they are not relevant to my problem as the rolling would still be for each new day).

I cannot apply resampling, because then the sample would be every 5 months, e..g I would only have values for May 2012, Oct 2012, March 2013... Finally, as the function is not linear I cannot reconstruct it by first doing a monthly sample and then applying a 5 period rolling window on it.

So I would need a sort of resampling functionality applied to a rolling window defined by time interval (not number of values).

How can I do this in pandas? One approach could be to combine several (5 in this example) resampled (5 months) time series, each with one month of offset and then align all these series into one... but I do not know how to implement this.

401

asked Jul 21 '14 22:07

Mannaggia

3 Answers

I had a similar issue dealing with a timedelta series where I wanted to take a moving average and then resample. Here is an example where I have 100 seconds of data. I take a rolling average of 10 second windows and then resample for every 5 seconds, taking the first entry in each resample bin. The result should be the previous 10 second average at 5 second increments. You could do something similar with month format instead of seconds:

df = pd.DataFrame(range(0,100), index=pd.TimedeltaIndex(range(0,100),'s'))
df.rolling('10s').mean().resample('5s').first()

Result:

             0
00:00:00   0.0
00:00:05   2.5
00:00:10   5.5
00:00:15  10.5
00:00:20  15.5
00:00:25  20.5
00:00:30  25.5
00:00:35  30.5
00:00:40  35.5
00:00:45  40.5
00:00:50  45.5
00:00:55  50.5
00:01:00  55.5
00:01:05  60.5
00:01:10  65.5
00:01:15  70.5
00:01:20  75.5
00:01:25  80.5
00:01:30  85.5
00:01:35  90.5

answered Nov 05 '22 02:11

Jonathan Koren

Here's an attempt - not super clean, but it might work.

Dummy data:

df = pd.DataFrame(data={'a': 1.}, 
                  index=pd.date_range(start='2001-1-1', periods=1000))

First define a function to decrease a date n number of months. This needs to be cleaned up, but works for n<=12.

from datetime import datetime    
def decrease_month(date, n):
    assert(n <= 12)

    new_month = date.month - n
    year_offset = 0
    if new_month <= 0:
        year_offset = -1
        new_month = 12 + new_month

    return datetime(date.year + year_offset, new_month, 1)

Then, add 5 new columns for the 5 rolling periods that each date will cross.

for n in range(rolling_period):
    df['m_' + str(n)] = df.index.map(lambda x: decrease_month(x, n))

Then - use the melt function to convert the data from wide to long, so each rolling period will have one entry.

df_m = pd.melt(df, id_vars='a')

You should be able to groupby the newly created column, and each date will represent the right 5 month rolling period.

In [222]: df_m.groupby('value').sum()
Out[222]: 
              a
value          
2000-09-01   31
2000-10-01   59
2000-11-01   90
2000-12-01  120
2001-01-01  151
2001-02-01  150
2001-03-01  153
2001-04-01  153
2001-05-01  153
2001-06-01  153
2001-07-01  153
...

answered Nov 05 '22 02:11

chrisb

I have solved a similar problem with the following code:

interval = 5
frames = []
for base in range(interval):
  frame = data.resample(f"{interval}min", base=base).last()
  frames.append(frame)

pd.concat(frames, axis=0).sort_index()

Here I create 5 data frames which are resampled at the same interval, but have different offsets (the base parameter). Then I just have to concatenate and sort them. Should usually be much more efficient than rolling + resampling (the only overhead is the sorting).

answered Nov 05 '22 03:11

Joel

Related questions
                            
                                N-queen backtracking in Python: how to return solutions instead of printing them?
                            
                                Merge/Join/Append two Pandas DataFrames with MultiIndex columns by both index and cols
                            
                                Robust tracking of blobs
                            
                                Is there any solution for the QtWebKit memory leak?
                            
                                Calculate difference between multiples of two different numbers
                            
                                python 3 pip install non-ASCII payload error
                            
                                Python Selenium wait for several elements to load
                            
                                How to make a widget span multiple columns/rows in gridlayout in kivy
                            
                                Log HTTP traffic with Webdriver and PhantomJS
                            
                                Amending URL for "list_display_links" in Django 1.6 admin change list
                            
                                Tested implementation of APriori and FP-growth in python [closed]
                            
                                Python passing callback function in a class
                            
                                Python 3.4 - Library for 2D graphics [closed]
                            
                                Problems with TCP hole punching
                            
                                Python smtplib.sendmail Mime Multipart body doesn't shown on iPhone
                            
                                How to change the window title in pyside?
                            
                                python3 - behaviour of super() on multi-inheritance
                            
                                Designing a Regex to find any Noun Phrase
                            
                                Pandas DataFrame Object Inheritance or Object Use?
                            
                                How to compute which way data points continue beyond an intersection?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Resample in a rolling window using pandas

Tags:

python

pandas

resampling