Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas rolling functions with time groupby

Tags:

python

pandas

here is my problem. What I have is a DataFrame as follows:

df:

2013-10-24      1
2013-10-25      2
2013-11-27      3 
2013-11-28      4
2013-12-01      5 
2013-12-02      6

What I want is a DataFrame such like this:

rolling_mean(df, window='1M'):

2013-10      1.5
2013-11      3.5
2013-12      5.5 

rolling_mean(df, window='2M'):

2013-10      NAN
2013-11      2.5
2013-12      4.5 

rolling_mean(df, window='3M'):

2013-10      NAN
2013-11      NAN
2013-12      3.5 

rolling_mean(df, window='1Y'):

2013-10      NAN
2013-11      NAN
2013-12      NAN

where 1M is '1 month', 2M is '2 months'. The window is not a int value, but a time interval such as '1D', '3M', '1Y' and so on. The function could groupby the dataframe by the time unit such as 'D', 'M', 'Y', and then rolling the dataframe through the number before the time unit such as 1, 3...

I need a rolling function such like this? Could anybody help me? Did I give a clear description? Many thanks.

Update:

I still have a puzzle. I need to implement such a function which could calculate rolling standard deviation of every day, not resampled by month, but the window step unit is weighed by month.

In this scenario, what I have is also df:

2013-10-24      1
2013-10-25      2
2013-11-27      3 
2013-11-28      4
2013-12-01      5 
2013-12-02      6

pd.rolling_std(df.resample('1M'),window=1):

The result is

2013-10    NAN
2013-11    NAN 
2013-12    NAN

what I really is a dataframe like this(window = 1 ):

2013-10    0.5
2013-11    0.5 
2013-12    0.5

The first 0.5 is the standard deviation which can be calculated by np.sqrt([1,2]) from October. Also 0.5 from the others are from [3,4] and [5,6]. However, no matter what how = 'xxx' method in the resample function, the result is not right. The objective result of 2months is,

df (window = 2 ):

2013-10    NAN
2013-11    1.1180 
2013-12    1.1180

The first 1.1180 is the standard deviation which can be calculated by np.sqrt([1,2,3,4]) from October and November. 1.1180 from 2013-12 are from [3,4,5,6] of 2013-11 and 2013-12.

p.s. The standard deviation is one of my functions which I want to implement rolling... THANK YOU~

like image 731
diaosihuai Avatar asked Oct 30 '22 16:10

diaosihuai


1 Answers

You can use to_datetime on a date column to generate a datetimeindex.

df = pd.DataFrame({'value': [1, 2, 3, 4, 5, 6]},
                  index=['2013-10-24', '2013-10-25', '2013-11-27', 
                         '2013-11-28', '2013-12-01', '2013-12-02'])           
df.index = pd.to_datetime(df.index)

>>> pd.rolling_mean(df.resample('1M'), 1, freq='1M')
            value
2013-10-31    1.5
2013-11-30    3.5
2013-12-31    5.5

>>> pd.rolling_mean(df.resample('2M'), window=1, freq='1M')
            value
2013-10-31    1.5
2013-11-30    NaN
2013-12-31    4.5

>>> pd.rolling_mean(df.resample('1M'), window=2, freq='1M')
            value
2013-10-31    NaN
2013-11-30    2.5
2013-12-31    4.5

>>> pd.rolling_mean(df.resample('1M'), window=3, freq='1M')
            value
2013-10-31    NaN
2013-11-30    NaN
2013-12-31    3.5

>>> pd.rolling_mean(df.resample('1M'), window=12, freq='1M')
            value
2013-10-31    NaN
2013-11-30    NaN
2013-12-31    NaN
like image 107
Alexander Avatar answered Nov 15 '22 03:11

Alexander