Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas rolling apply function to entire window dataframe

I want to apply a function to a rolling window. All the answers I saw here are focused on applying to a single row / column, but I would like to apply my function to the entire window. Here is a simplified example:

import pandas as pd
data = [ [1,2], [3,4], [3,4], [6,6], [9,1], [11,2] ]
df = pd.DataFrame(columns=list('AB'), data=data)

This is df:

    A   B
0   1   2
1   3   4
2   3   4
3   6   6
4   9   1
5   11  2

Take some function to apply to the entire window:

df.rolling(3).apply(lambda x: x.shape)

In this example, I would like to get something like:

    some_name   
0   NA  
1   NA  
2   (3,2)   
3   (3,2)   
4   (3,2)   
5   (3,2)   

Of course, the shape is used as an example showing f treats the entire window as the object of calculation, not just a row / column. I tried playing with the axis keyword for rolling, as well as with the raw keyword for apply but with no success. Other methods (agg, transform) do not seem to deliver either.

Sure, I can do this with a list comprehension. Just thought there is an easier / cleaner way of doing this.

like image 731
Yair Daon Avatar asked May 05 '19 09:05

Yair Daon


People also ask

How do you apply a function to an entire DataFrame in Python?

The apply() function is used to apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1).

Is apply function faster than for loop Python?

apply is not faster in itself but it has advantages when used in combination with DataFrames. This depends on the content of the apply expression. If it can be executed in Cython space, apply is much faster (which is the case here).

Does pandas apply use multiple cores?

Operations on data frame using Pandas is slow, as it uses a single-core of CPU to perform the computations, and does not take advantage of a multi-core CPU.

What is Min_periods in rolling?

The min_periods argument specifies the minimum number of observations in the current window required to generate a rolling value; otherwise, the result is NaN .


2 Answers

Not with pd.DataFrame.rolling .... that function is applied iteratively to the columns, taking in a series of floats/NaN, and returning a series of floats/NaN, one-by-one. I think you'll have better luck with your intuition....

def rolling_pipe(dataframe, window, fctn):
    return pd.Series([dataframe.iloc[i-window: i].pipe(fctn) 
                      if i >= window else None 
                      for i in range(1, len(dataframe)+1)],
                     index = dataframe.index) 

df.pipe(rolling_pipe, 3, lambda x: x.shape)
like image 166
Ouyang Ze Avatar answered Oct 07 '22 00:10

Ouyang Ze


The argument supplied to your apply function is a Series with an index property containing start, stop and step properties.

RangeIndex(start=0, stop=2, step=1)

You can use this to query your data frame.

df = pd.DataFrame([('Sean', i) for i in range(1,11)], columns=['name', 'value'])

def func(series):
    view = df.iloc[series.index]
    # use view to do something...
    count = len(view[view.value.isin([1,2,8])])
    return count

df['count'] = df.value.rolling(2).apply(func)

There may be a more efficient way to do this but I'm not sure how.

like image 45
seanbehan Avatar answered Oct 07 '22 00:10

seanbehan