How to use .rolling() on each row of a Pandas dataframe?

Tags:

I create a Pandas dataframe df:

df.head()
Out[1]: 
                    A           B   DateTime 
2010-01-01  50.662365  101.035099 2010-01-01             
2010-01-02  47.652424   99.274288 2010-01-02            
2010-01-03  51.387459   99.747135 2010-01-03               
2010-01-04  52.344788   99.621896 2010-01-04               
2010-01-05  47.106364   98.286224 2010-01-05

I can add a moving average of column A:

df['A_moving_average'] = df.A.rolling(window=50, axis="rows") \
                             .apply(lambda x: np.mean(x))

Question: how do I add a moving average of columns A and B?

This should work, but it gives an error:

df['A_B_moving_average'] = df.rolling(window=50, axis="rows") \
                             .apply(lambda row: (np.mean(row.A) + np.mean(row.B)) / 2)

The error is:

NotImplementedError: ops for Rolling for this dtype datetime64[ns] are not implemented

Appendix A: Code to create Pandas dataframe

Here is how I created the test Pandas dataframe df:

import numpy.random as rnd
import pandas as pd
import numpy as np

count = 1000

dates = pd.date_range('1/1/2010', periods=count, freq='D')

df = pd.DataFrame(
    {
        'DateTime': dates,
        'A': rnd.normal(50, 2, count), # Mean 50, standard deviation 2
        'B': rnd.normal(100, 4, count) # Mean 100, standard deviation 4
    }, index=dates
)

235

asked Aug 03 '17 09:08

Contango

1 Answers

I couldn't find a direct solution to the general problem of using multiple columns in rolling - but in your specific case you can just take the mean of columns A and B and then apply your rolling:

df['A_B_moving_average'] = ((df.A + df.B) / 2).rolling(window=50, axis='rows').mean()

Just as explanation: If you specify the whole DataFrame for rolling with axis='rows' each column is performed seperatly. So:

df['A_B_moving_average'] = df.rolling(window=5, axis='rows').mean()

will first evaluate the rolling window for A (works) then for B (works) and then for DateTime (doesn't work, thus the error). And each rolling window will be a plain NumPy array so you can't access the "column names". Just as demonstration using prints:

import numpy.random as rnd
import pandas as pd
import numpy as np
count = 10
dates = pd.date_range('1/1/2010', periods=count, freq='D')
df = pd.DataFrame(
    {
        'DateTime': dates,
        'A': rnd.normal(50, 2, count), # Mean 50, standard deviation 2
        'B': rnd.normal(100, 4, count) # Mean 100, standard deviation 4
    }, index=dates
)
df[['A', 'B']].rolling(window=6, axis='rows').apply(lambda row: print(row) or np.max(row))

prints:

[ 47.32327354  48.12322447  50.86806381  49.3676319   47.81335338
  49.66915104]
[ 48.12322447  50.86806381  49.3676319   47.81335338  49.66915104
  48.01520798]
[ 50.86806381  49.3676319   47.81335338  49.66915104  48.01520798
  48.14089864]
[ 49.3676319   47.81335338  49.66915104  48.01520798  48.14089864
  51.89999973]
[ 47.81335338  49.66915104  48.01520798  48.14089864  51.89999973
  48.76838054]
[ 100.10662696   96.72411985  103.24600664   95.03841539   95.23430836
  102.30955102]
[  96.72411985  103.24600664   95.03841539   95.23430836  102.30955102
   95.18273088]
[ 103.24600664   95.03841539   95.23430836  102.30955102   95.18273088
   97.36751546]
[  95.03841539   95.23430836  102.30955102   95.18273088   97.36751546
   99.25325622]
[  95.23430836  102.30955102   95.18273088   97.36751546   99.25325622
  105.16747544]

The first ones are from column A and the last ones from column B and all of them are plain arrays.

answered Nov 09 '22 23:11

MSeifert

Related questions
                            
                                Strange behavior in Python, Line missing, different outputs
                            
                                Can signal handlers memory leak in PyQt? [duplicate]
                            
                                Keras CNN, verbose training progress bar display
                            
                                pyplot plot freezes (not responding)
                            
                                Airflow - long running task in SubDag marked as failed after an hour
                            
                                Long Sequence In a seq2seq model with attention?
                            
                                Basic multi GPU parallelization of matrix multiplication
                            
                                Using spaCy to replace the "topic" of a sentence
                            
                                Rotate square to be normal to a vector
                            
                                Keras + Tensorflow : Debug NaNs
                            
                                ImportError: cannot import name 'IntEnum'
                            
                                Can tqdm be embedded to html?
                            
                                How to wrap templated classes with pybind11
                            
                                500 error while trying to enable CORS on POST with AWS API Gateway Proxy Integration
                            
                                HTTPS through proxy completely encrypted, including SSL CONNECT
                            
                                How to Retrieve Original Variables After Scikit Model Run w/OneHotEncoding
                            
                                spaCy and scikit-learn vectorizer
                            
                                simultaneously receive logs from Rabbitmq and run your flask app
                            
                                Numpy integer division sometimes yields wrong results when casted
                            
                                How to make a TLS request using a smartcard with python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use .rolling() on each row of a Pandas dataframe?

Tags:

python

pandas

numpy

time-series

Appendix A: Code to create Pandas dataframe

Contango

People also ask

1 Answers

MSeifert

Recent Activity

Donate For Us