I would like to compute the 1-year rolling average for each row in this Dataframe test
:
index id date variation
2313 7034 2018-03-14 4.139148e-06
2314 7034 2018-03-13 4.953194e-07
2315 7034 2018-03-12 2.854749e-06
2316 7034 2018-03-09 3.907458e-06
2317 7034 2018-03-08 1.662412e-06
2318 7034 2018-03-07 1.346433e-06
2319 7034 2018-03-06 8.731700e-06
2320 7034 2018-03-05 7.145597e-06
2321 7034 2018-03-02 4.893283e-06
...
For example, I would need to calculate:
7034
between 2018-03-14 and 2017-08-147034
between 2018-03-13 and 2017-08-13I tried:
test.groupby(['id','date'])['variation'].rolling(window=1,freq='Y',on='date').mean()
but I got the error message:
ValueError: invalid on specified as date, must be a column (if DataFrame) or None
How can I use the pandas rolling()
function in this case?
[EDIT 1] [thanks to Sacul]
I tested:
df['date'] = pd.to_datetime(df['date'])
df.set_index('date').groupby('id').rolling(window=1, freq='Y').mean()['variation']
But freq='Y'
doesn't work (I got: ValueError: Invalid frequency: Y
) Then I used window = 365, freq = 'D'
.
But there is another issue: because there are never 365 consecutive dates for each combined id-date
, the result is always empty. Even if there missing dates, I would like to ignore them and consider all dates between the current date and the (current date - 365) to compute the rolling mean. For instance, imagine I have:
index id date variation
2313 7034 2018-03-14 4.139148e-06
2314 7034 2018-03-13 4.953194e-07
2315 7034 2017-03-13 2.854749e-06
Then,
How can I do that?
[EDIT 2]
Finally I used the formula below to calculate rolling median, averages and standard deviation on 1 Year by ignoring missing values:
pd.rolling_median(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)
pd.rolling_mean(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)
pd.rolling_std(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)
To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.
In Python, we can calculate the moving average using . rolling() method. This method provides rolling windows over the data, and we can use the mean function over these windows to calculate moving averages. The size of the window is passed as a parameter in the function .
The min_periods argument specifies the minimum number of observations in the current window required to generate a rolling value; otherwise, the result is NaN .
A rolling mean is simply the mean of a certain number of previous periods in a time series. To calculate the rolling mean for one or more columns in a pandas DataFrame, we can use the following syntax: df['column_name']. rolling(rolling_window). mean()
I believe this should work for you:
# First make sure that `date` is a datetime object:
df['date'] = pd.to_datetime(df['date'])
df.set_index('date').groupby('id').rolling(window=1, freq='A').mean()['variation']
using pd.DataFrame.rolling
with datetime works well when the date
is the index, which is why I used df.set_index('date')
(as can be seen in one of the documentation's examples)
I can't really test if it works on the year's average on your example dataframe, as there is only one year and only one ID, but it should work.
[EDIT] As pointed out by Mihai-Andrei Dinculescu, freq
is now a deprecated argument. Here is an alternative (and probably more future-proof) way to do what you're looking for:
df.set_index('date').groupby('id')['variation'].resample('A').mean()
You can take a look at the resample
documentation for more details on how this works, and this link regarding the frequency arguments.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With