I create a Pandas dataframe df
:
df.head()
Out[1]:
A B DateTime
2010-01-01 50.662365 101.035099 2010-01-01
2010-01-02 47.652424 99.274288 2010-01-02
2010-01-03 51.387459 99.747135 2010-01-03
2010-01-04 52.344788 99.621896 2010-01-04
2010-01-05 47.106364 98.286224 2010-01-05
I can add a moving average of column A:
df['A_moving_average'] = df.A.rolling(window=50, axis="rows") \
.apply(lambda x: np.mean(x))
Question: how do I add a moving average of columns A and B?
This should work, but it gives an error:
df['A_B_moving_average'] = df.rolling(window=50, axis="rows") \
.apply(lambda row: (np.mean(row.A) + np.mean(row.B)) / 2)
The error is:
NotImplementedError: ops for Rolling for this dtype datetime64[ns] are not implemented
Here is how I created the test Pandas dataframe df
:
import numpy.random as rnd
import pandas as pd
import numpy as np
count = 1000
dates = pd.date_range('1/1/2010', periods=count, freq='D')
df = pd.DataFrame(
{
'DateTime': dates,
'A': rnd.normal(50, 2, count), # Mean 50, standard deviation 2
'B': rnd.normal(100, 4, count) # Mean 100, standard deviation 4
}, index=dates
)
Window Rolling Mean (Moving Average)The moving average calculation creates an updated average value for each row based on the window we specify. The calculation is also called a “rolling mean” because it's calculating an average of values within a specified range for each row as you go along the DataFrame.
rolling() function provides the feature of rolling window calculations. The concept of rolling window calculation is most primarily used in signal processing and time-series data. In very simple words we take a window size of k at a time and perform some desired mathematical operation on it.
One of the easiest ways to shuffle a Pandas Dataframe is to use the Pandas sample method. The df. sample method allows you to sample a number of rows in a Pandas Dataframe in a random order. Because of this, we can simply specify that we want to return the entire Pandas Dataframe, in a random order.
Pandas rolling | How rolling () Function works in Pandas Dataframe? Pandas rolling () function gives the element of moving window counts. The idea of moving window figuring is most essentially utilized in signal handling and time arrangement information.
Apply function to every row in a Pandas DataFrame. Python is a great language for performing data analysis tasks. It provides with a huge amount of Classes and function which help in analyzing and manipulating data in an easier way. One can use apply () function in order to apply function to every row in given dataframe.
See the notes below for further information. For a DataFrame, a datetime-like column on which to calculate the rolling window, rather than the DataFrame’s index. Provided integer column is ignored and excluded from result since an integer index is not used to calculate the rolling window.
Instead of processing each row in a Python loop, let’s try Pandas iterrows function. It takes 9.04 seconds, approx. one-fourth of the time taken by the loop: Method 3. Iterate over rows with itertuples Function Pandas has another method, itertuples, that processes rows as tuples.
I couldn't find a direct solution to the general problem of using multiple columns in rolling
- but in your specific case you can just take the mean of columns A and B and then apply your rolling
:
df['A_B_moving_average'] = ((df.A + df.B) / 2).rolling(window=50, axis='rows').mean()
Just as explanation: If you specify the whole DataFrame for rolling
with axis='rows'
each column is performed seperatly. So:
df['A_B_moving_average'] = df.rolling(window=5, axis='rows').mean()
will first evaluate the rolling window for A
(works) then for B
(works) and then for DateTime
(doesn't work, thus the error). And each rolling window will be a plain NumPy array so you can't access the "column names". Just as demonstration using print
s:
import numpy.random as rnd
import pandas as pd
import numpy as np
count = 10
dates = pd.date_range('1/1/2010', periods=count, freq='D')
df = pd.DataFrame(
{
'DateTime': dates,
'A': rnd.normal(50, 2, count), # Mean 50, standard deviation 2
'B': rnd.normal(100, 4, count) # Mean 100, standard deviation 4
}, index=dates
)
df[['A', 'B']].rolling(window=6, axis='rows').apply(lambda row: print(row) or np.max(row))
prints:
[ 47.32327354 48.12322447 50.86806381 49.3676319 47.81335338
49.66915104]
[ 48.12322447 50.86806381 49.3676319 47.81335338 49.66915104
48.01520798]
[ 50.86806381 49.3676319 47.81335338 49.66915104 48.01520798
48.14089864]
[ 49.3676319 47.81335338 49.66915104 48.01520798 48.14089864
51.89999973]
[ 47.81335338 49.66915104 48.01520798 48.14089864 51.89999973
48.76838054]
[ 100.10662696 96.72411985 103.24600664 95.03841539 95.23430836
102.30955102]
[ 96.72411985 103.24600664 95.03841539 95.23430836 102.30955102
95.18273088]
[ 103.24600664 95.03841539 95.23430836 102.30955102 95.18273088
97.36751546]
[ 95.03841539 95.23430836 102.30955102 95.18273088 97.36751546
99.25325622]
[ 95.23430836 102.30955102 95.18273088 97.36751546 99.25325622
105.16747544]
The first ones are from column A
and the last ones from column B
and all of them are plain arrays.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With