1 Year Rolling mean pandas on column date

Tags:

I would like to compute the 1-year rolling average for each row in this Dataframe test:

index   id      date        variation
2313    7034    2018-03-14  4.139148e-06
2314    7034    2018-03-13  4.953194e-07
2315    7034    2018-03-12  2.854749e-06
2316    7034    2018-03-09  3.907458e-06
2317    7034    2018-03-08  1.662412e-06
2318    7034    2018-03-07  1.346433e-06
2319    7034    2018-03-06  8.731700e-06
2320    7034    2018-03-05  7.145597e-06
2321    7034    2018-03-02  4.893283e-06
...

For example, I would need to calculate:

mean of variation of id 7034 between 2018-03-14 and 2017-08-14
mean of variation of id 7034 between 2018-03-13 and 2017-08-13
etc.

I tried:

test.groupby(['id','date'])['variation'].rolling(window=1,freq='Y',on='date').mean()

but I got the error message:

ValueError: invalid on specified as date, must be a column (if DataFrame) or None

How can I use the pandas rolling() function in this case?

[EDIT 1] [thanks to Sacul]

I tested:

df['date'] = pd.to_datetime(df['date'])

df.set_index('date').groupby('id').rolling(window=1, freq='Y').mean()['variation']

But freq='Y' doesn't work (I got: ValueError: Invalid frequency: Y) Then I used window = 365, freq = 'D'.

But there is another issue: because there are never 365 consecutive dates for each combined id-date, the result is always empty. Even if there missing dates, I would like to ignore them and consider all dates between the current date and the (current date - 365) to compute the rolling mean. For instance, imagine I have:

index   id      date        variation
2313    7034    2018-03-14  4.139148e-06
2314    7034    2018-03-13  4.953194e-07
2315    7034    2017-03-13  2.854749e-06

Then,

for 7034 2018-03-14: I would like to compute MEAN(4.139148e-06,4.953194e-07, 2.854749e-06)
for 7034 2018-03-13: I would like to compute also MEAN(4.139148e-06,4.953194e-07, 2.854749e-06)

How can I do that?

[EDIT 2]

Finally I used the formula below to calculate rolling median, averages and standard deviation on 1 Year by ignoring missing values:

pd.rolling_median(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)

pd.rolling_mean(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)

pd.rolling_std(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)

631

asked Mar 20 '18 15:03

Thomas

1 Answers

I believe this should work for you:

# First make sure that `date` is a datetime object:

df['date'] = pd.to_datetime(df['date'])

df.set_index('date').groupby('id').rolling(window=1, freq='A').mean()['variation']

using pd.DataFrame.rolling with datetime works well when the date is the index, which is why I used df.set_index('date') (as can be seen in one of the documentation's examples)

I can't really test if it works on the year's average on your example dataframe, as there is only one year and only one ID, but it should work.

Arguably Better Solution:

[EDIT] As pointed out by Mihai-Andrei Dinculescu, freq is now a deprecated argument. Here is an alternative (and probably more future-proof) way to do what you're looking for:

df.set_index('date').groupby('id')['variation'].resample('A').mean()

You can take a look at the resample documentation for more details on how this works, and this link regarding the frequency arguments.

139

answered Sep 26 '22 06:09

sacuL

Related questions
                            
                                Unit testing elastic search inside Django app
                            
                                Spark program gives odd results when ran on standalone cluster
                            
                                Troubleshooting Websockets with EC2 on AWS using Django
                            
                                Generate all possible outcomes of k balls in n bins (sum of multinomial / categorical outcomes)
                            
                                Tensorflow implementation of word2vec
                            
                                Individual timeouts for concurrent.futures
                            
                                Bulk upsert with SQLAlchemy [duplicate]
                            
                                How to insert arbitrary JSON in HTML's script tag
                            
                                File not uploading with Flask-wtforms in cookiecutter-flask app
                            
                                How to maintain different country versions of same language in Django?
                            
                                Stuck implementing simple neural network
                            
                                MongoDB: Can't initiate replica set; 'has data already, cannot initiate set'
                            
                                Vectorizing the Kinect real-world coordinate processing algorithm for speed
                            
                                Using Django URLs with AngularJs routeProvider
                            
                                pip not installing entry_points as executables
                            
                                Order-invariant hash in Python
                            
                                Match set of x,y points to another set that is scaled, rotated, translated, and with missing elements
                            
                                how to tune parameters of custom kernel function with pipeline in scikit-learn
                            
                                How to predict values with a trained Tensorflow model
                            
                                How to disable CSS in Python selenium using ChromeOptions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

1 Year Rolling mean pandas on column date

Tags:

python

pandas

rolling-computation

rolling-average

Thomas

People also ask

1 Answers

Arguably Better Solution:

sacuL

Recent Activity

Donate For Us