Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use .rolling() on each row of a Pandas dataframe?

I create a Pandas dataframe df:

df.head()
Out[1]: 
                    A           B   DateTime 
2010-01-01  50.662365  101.035099 2010-01-01             
2010-01-02  47.652424   99.274288 2010-01-02            
2010-01-03  51.387459   99.747135 2010-01-03               
2010-01-04  52.344788   99.621896 2010-01-04               
2010-01-05  47.106364   98.286224 2010-01-05               

I can add a moving average of column A:

df['A_moving_average'] = df.A.rolling(window=50, axis="rows") \
                             .apply(lambda x: np.mean(x))

Question: how do I add a moving average of columns A and B?

This should work, but it gives an error:

df['A_B_moving_average'] = df.rolling(window=50, axis="rows") \
                             .apply(lambda row: (np.mean(row.A) + np.mean(row.B)) / 2)

The error is:

NotImplementedError: ops for Rolling for this dtype datetime64[ns] are not implemented

Appendix A: Code to create Pandas dataframe

Here is how I created the test Pandas dataframe df:

import numpy.random as rnd
import pandas as pd
import numpy as np

count = 1000

dates = pd.date_range('1/1/2010', periods=count, freq='D')

df = pd.DataFrame(
    {
        'DateTime': dates,
        'A': rnd.normal(50, 2, count), # Mean 50, standard deviation 2
        'B': rnd.normal(100, 4, count) # Mean 100, standard deviation 4
    }, index=dates
)
like image 235
Contango Avatar asked Aug 03 '17 09:08

Contango


People also ask

How does rolling work in pandas?

Window Rolling Mean (Moving Average)The moving average calculation creates an updated average value for each row based on the window we specify. The calculation is also called a “rolling mean” because it's calculating an average of values within a specified range for each row as you go along the DataFrame.

How do you roll a DataFrame?

rolling() function provides the feature of rolling window calculations. The concept of rolling window calculation is most primarily used in signal processing and time-series data. In very simple words we take a window size of k at a time and perform some desired mathematical operation on it.

How do I shuffle all rows in a DataFrame?

One of the easiest ways to shuffle a Pandas Dataframe is to use the Pandas sample method. The df. sample method allows you to sample a number of rows in a Pandas Dataframe in a random order. Because of this, we can simply specify that we want to return the entire Pandas Dataframe, in a random order.

How rolling () function works in pandas Dataframe?

Pandas rolling | How rolling () Function works in Pandas Dataframe? Pandas rolling () function gives the element of moving window counts. The idea of moving window figuring is most essentially utilized in signal handling and time arrangement information.

How to apply function to every row in a pandas Dataframe?

Apply function to every row in a Pandas DataFrame. Python is a great language for performing data analysis tasks. It provides with a huge amount of Classes and function which help in analyzing and manipulating data in an easier way. One can use apply () function in order to apply function to every row in given dataframe.

Can a Dataframe be used as a rolling window?

See the notes below for further information. For a DataFrame, a datetime-like column on which to calculate the rolling window, rather than the DataFrame’s index. Provided integer column is ignored and excluded from result since an integer index is not used to calculate the rolling window.

How long does it take to iterate over a row in pandas?

Instead of processing each row in a Python loop, let’s try Pandas iterrows function. It takes 9.04 seconds, approx. one-fourth of the time taken by the loop: Method 3. Iterate over rows with itertuples Function Pandas has another method, itertuples, that processes rows as tuples.


1 Answers

I couldn't find a direct solution to the general problem of using multiple columns in rolling - but in your specific case you can just take the mean of columns A and B and then apply your rolling:

df['A_B_moving_average'] = ((df.A + df.B) / 2).rolling(window=50, axis='rows').mean()

Just as explanation: If you specify the whole DataFrame for rolling with axis='rows' each column is performed seperatly. So:

df['A_B_moving_average'] = df.rolling(window=5, axis='rows').mean()

will first evaluate the rolling window for A (works) then for B (works) and then for DateTime (doesn't work, thus the error). And each rolling window will be a plain NumPy array so you can't access the "column names". Just as demonstration using prints:

import numpy.random as rnd
import pandas as pd
import numpy as np
count = 10
dates = pd.date_range('1/1/2010', periods=count, freq='D')
df = pd.DataFrame(
    {
        'DateTime': dates,
        'A': rnd.normal(50, 2, count), # Mean 50, standard deviation 2
        'B': rnd.normal(100, 4, count) # Mean 100, standard deviation 4
    }, index=dates
)
df[['A', 'B']].rolling(window=6, axis='rows').apply(lambda row: print(row) or np.max(row))

prints:

[ 47.32327354  48.12322447  50.86806381  49.3676319   47.81335338
  49.66915104]
[ 48.12322447  50.86806381  49.3676319   47.81335338  49.66915104
  48.01520798]
[ 50.86806381  49.3676319   47.81335338  49.66915104  48.01520798
  48.14089864]
[ 49.3676319   47.81335338  49.66915104  48.01520798  48.14089864
  51.89999973]
[ 47.81335338  49.66915104  48.01520798  48.14089864  51.89999973
  48.76838054]
[ 100.10662696   96.72411985  103.24600664   95.03841539   95.23430836
  102.30955102]
[  96.72411985  103.24600664   95.03841539   95.23430836  102.30955102
   95.18273088]
[ 103.24600664   95.03841539   95.23430836  102.30955102   95.18273088
   97.36751546]
[  95.03841539   95.23430836  102.30955102   95.18273088   97.36751546
   99.25325622]
[  95.23430836  102.30955102   95.18273088   97.36751546   99.25325622
  105.16747544]

The first ones are from column A and the last ones from column B and all of them are plain arrays.

like image 54
MSeifert Avatar answered Nov 09 '22 23:11

MSeifert