Resampling with custom periods

Tags:

pandas

Is there a 'cookbook' way of resampling a DataFrame with (semi)irregular periods?

I have a dataset at a daily interval and want it to resample to what sometimes (in scientific literature) is named dekad's. I dont think there is a proper English term for it but its basically chopping a month in three ~ten-day parts where the third is a remainder of anything between 8 and 11 days.

I came up with two solutions myself, a specific one for this case and a more general one for any irregular periods. But both arent really good, so im curiuous how others handle these type of situations.

Lets start with creating some sample data:

import pandas as pd

begin = pd.datetime(2013,1,1)
end = pd.datetime(2013,2,20)

dtrange = pd.date_range(begin, end)

p1 = np.random.rand(len(dtrange)) + 5
p2 = np.random.rand(len(dtrange)) + 10

df = pd.DataFrame({'p1': p1, 'p2': p2}, index=dtrange)

The first thing i came up with is grouping by individual months (YYYYMM) and then slicing it manually. Like:

def to_dec1(data, func):

    # create the indexes, start of the ~10day period
    idx1 = pd.datetime(data.index[0].year, data.index[0].month, 1)
    idx2 = idx1 + datetime.timedelta(days=10)
    idx3 = idx2 + datetime.timedelta(days=10)

    # slice the period and perform function
    oneday = datetime.timedelta(days=1)
    fir = func(data.ix[:idx2 - oneday].values, axis=0)
    sec = func(data.ix[idx2:idx3 - oneday].values, axis=0)
    thi = func(data.ix[idx3:].values, axis=0)

    return pd.DataFrame([fir,sec,thi], index=[idx1,idx2,idx3], columns=data.columns)

dfmean = df.groupby(lambda x: x.strftime('%Y%m'), group_keys=False).apply(to_dec1, np.mean)

Which results in:

print dfmean

                  p1         p2
2013-01-01  5.436778  10.409845
2013-01-11  5.534509  10.482231
2013-01-21  5.449058  10.454777
2013-02-01  5.685700  10.422697
2013-02-11  5.578137  10.532180
2013-02-21       NaN        NaN

Note that you always get a full month of 'dekads' in return, its not a problem and easy to remove if needed.

The other solution works by providing a range of dates at which you chop up the DataFrame and perform a function on each segment. Its more flexible in terms of the periods you want.

def to_dec2(data, dts, func):

    chucks = []
    for n,start in enumerate(dts[:-1]):

        end = dts[n+1] - datetime.timedelta(days=1)
        chucks.append(func(data.ix[start:end].values, axis=0))

    return pd.DataFrame(chucks, index=dts[:-1], columns=data.columns)

dfmean2 = to_dec2(df, dfmean.index, np.mean)

Note that im using the index of the previous result as the range of dates to save some time 'building' it myself.

What would be the best way of handling these cases? Is there perhaps a bit more build-in method in Pandas?

835

asked Mar 14 '13 11:03

Rutger Kassies

2 Answers

If you use numpy 1.7, you can use datetime64 & timedelta64 arrays to do the calculation:

create the sample data:

import pandas as pd
import numpy as np

begin = pd.datetime(2013,1,1)
end = pd.datetime(2013,2,20)

dtrange = pd.date_range(begin, end)

p1 = np.random.rand(len(dtrange)) + 5
p2 = np.random.rand(len(dtrange)) + 10

df = pd.DataFrame({'p1': p1, 'p2': p2}, index=dtrange)

calculate the dekad's date:

d = df.index.day - np.clip((df.index.day-1) // 10, 0, 2)*10 - 1
date = df.index.values - np.array(d, dtype="timedelta64[D]")
df.groupby(date).mean()

The output is:

                 p1         p2
2013-01-01  5.413795  10.445640
2013-01-11  5.516063  10.491339
2013-01-21  5.539676  10.528745
2013-02-01  5.783467  10.478001
2013-02-11  5.358787  10.579149

127

answered Oct 19 '22 18:10

HYRY

Using HYRY's data and solution up to the computation of the d variable, we can also do the following in pandas 0.11-dev or later (regardless of numpy version):

In [18]: from datetime import timedelta

In [23]: pd.Series([ timedelta(int(i)) for i in d ])
Out[23]: 
0             00:00:00
1     1 days, 00:00:00
2     2 days, 00:00:00
3     3 days, 00:00:00
4     4 days, 00:00:00
5     5 days, 00:00:00
6     6 days, 00:00:00
7     7 days, 00:00:00
8     8 days, 00:00:00
9     9 days, 00:00:00
10            00:00:00

47    6 days, 00:00:00
48    7 days, 00:00:00
49    8 days, 00:00:00
50    9 days, 00:00:00
Length: 51, dtype: timedelta64[ns]

The date is constructed similary to above

date = pd.Series(df.index) - pd.Series([ timedelta(int(i)) for i in d ])
df.groupby(date.values).mean()

answered Oct 19 '22 18:10

Jeff

Related questions
                            
                                python: dots in the name of variable in a format string
                            
                                How to get variable length placeholders in a Python call to SQLite3
                            
                                Django Model Auto Increment Primary Key Based on Foreign Key
                            
                                Persistent memoization in Python
                            
                                Confusion about __get__ and __call__ in python [duplicate]
                            
                                Python 2.x return values for cmp
                            
                                Strings and the and operator: best practice, differences with +
                            
                                Django: model object "has no attribute '_meta'" in class based view
                            
                                Using pip to install modules in python failing
                            
                                How does Python's regex pattern caching work?
                            
                                PyQt ProgressBar
                            
                                Python lists/arrays: disable negative indexing wrap-around in slices
                            
                                Python Dijkstra k shortest paths
                            
                                selenium webdriver sendkeys() using python and firefox
                            
                                toggling decorators
                            
                                Converting SVG with Embedded CSS to PDF in Python
                            
                                Pass dict with non string keywords to function in kwargs
                            
                                django - model unicode() show foreignkey object attribute
                            
                                why does Contextmanager throws a runtime error 'generator didn't stop after throw()'?
                            
                                Pandas error - invalid value encountered

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With