I want to apply a function to a rolling window. All the answers I saw here are focused on applying to a single row / column, but I would like to apply my function to the entire window. Here is a simplified example: <pre class="prettyprint"><code>import pandas as pd data = [ [1,2], [3,4], [3,4], [6,6], [9,1], [11,2] ] df = pd.DataFrame(columns=list('AB'), data=data) </code></pre> This is <code>df</code>: <pre class="prettyprint"><code> A B 0 1 2 1 3 4 2 3 4 3 6 6 4 9 1 5 11 2 </code></pre> Take some function to apply to the entire window: <pre class="prettyprint"><code>df.rolling(3).apply(lambda x: x.shape) </code></pre> In this example, I would like to get something like: <pre class="prettyprint"><code> some_name 0 NA 1 NA 2 (3,2) 3 (3,2) 4 (3,2) 5 (3,2) </code></pre> Of course, the shape is used as an example showing <code>f</code> treats the entire window as the object of calculation, not just a row / column. I tried playing with the <code>axis</code> keyword for <code>rolling</code>, as well as with the <code>raw</code> keyword for <code>apply</code> but with no success. Other methods (<code>agg, transform</code>) do not seem to deliver either. Sure, I can do this with a list comprehension. Just thought there is an easier / cleaner way of doing this.

Not with <code>pd.DataFrame.rolling</code> .... that function is applied iteratively to the columns, taking in a series of floats/NaN, and returning a series of floats/NaN, one-by-one. I think you'll have better luck with your intuition.... <pre class="prettyprint"><code>def rolling_pipe(dataframe, window, fctn): return pd.Series([dataframe.iloc[i-window: i].pipe(fctn) if i >= window else None for i in range(1, len(dataframe)+1)], index = dataframe.index) df.pipe(rolling_pipe, 3, lambda x: x.shape) </code></pre>

The argument supplied to your apply function is a Series with an index property containing start, stop and step properties. <pre class="prettyprint"><code>RangeIndex(start=0, stop=2, step=1) </code></pre> You can use this to query your data frame. <pre class="prettyprint"><code>df = pd.DataFrame([('Sean', i) for i in range(1,11)], columns=['name', 'value']) def func(series): view = df.iloc[series.index] # use view to do something... count = len(view[view.value.isin([1,2,8])]) return count df['count'] = df.value.rolling(2).apply(func) </code></pre> There may be a more efficient way to do this but I'm not sure how.

Pandas rolling apply function to entire window dataframe

Tags:

python

pandas

apply

rolling-computation

I want to apply a function to a rolling window. All the answers I saw here are focused on applying to a single row / column, but I would like to apply my function to the entire window. Here is a simplified example:

import pandas as pd
data = [ [1,2], [3,4], [3,4], [6,6], [9,1], [11,2] ]
df = pd.DataFrame(columns=list('AB'), data=data)

This is df:

Take some function to apply to the entire window:

df.rolling(3).apply(lambda x: x.shape)

In this example, I would like to get something like:

    some_name   
0   NA  
1   NA  
2   (3,2)   
3   (3,2)   
4   (3,2)   
5   (3,2)

Of course, the shape is used as an example showing f treats the entire window as the object of calculation, not just a row / column. I tried playing with the axis keyword for rolling, as well as with the raw keyword for apply but with no success. Other methods (agg, transform) do not seem to deliver either.

Sure, I can do this with a list comprehension. Just thought there is an easier / cleaner way of doing this.

731

asked May 05 '19 09:05

Yair Daon

2 Answers

Not with pd.DataFrame.rolling .... that function is applied iteratively to the columns, taking in a series of floats/NaN, and returning a series of floats/NaN, one-by-one. I think you'll have better luck with your intuition....

def rolling_pipe(dataframe, window, fctn):
    return pd.Series([dataframe.iloc[i-window: i].pipe(fctn) 
                      if i >= window else None 
                      for i in range(1, len(dataframe)+1)],
                     index = dataframe.index) 

df.pipe(rolling_pipe, 3, lambda x: x.shape)

166

answered Oct 07 '22 00:10

Ouyang Ze

The argument supplied to your apply function is a Series with an index property containing start, stop and step properties.

RangeIndex(start=0, stop=2, step=1)

You can use this to query your data frame.

df = pd.DataFrame([('Sean', i) for i in range(1,11)], columns=['name', 'value'])

def func(series):
    view = df.iloc[series.index]
    # use view to do something...
    count = len(view[view.value.isin([1,2,8])])
    return count

df['count'] = df.value.rolling(2).apply(func)

There may be a more efficient way to do this but I'm not sure how.

answered Oct 07 '22 00:10

seanbehan

Related questions
                            
                                Ansible not able to find python module
                            
                                Is there any `strip`-liked method for a list?
                            
                                How can I catch a connection refused error in a proper way?
                            
                                How to solve UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte in python
                            
                                Apply multiple StandardScaler's to individual groups?
                            
                                Replace values in dataframe column depending on another column with condition
                            
                                Matplotlib: Aligning two y-axis around zero
                            
                                Bokeh - Do not show tooltip if it has missing value
                            
                                pyenv won't build new python version (hangs)
                            
                                How to import python file as module in Jupyter notebook?
                            
                                How to make a seed to pd.sample like np.random.seed?
                            
                                pip install latest dependency versions
                            
                                How to iterate over a large list without blocking event loop
                            
                                Adding rows for each month in a dataframe based on column date
                            
                                How to plot two variables on two different y-axes in python? [duplicate]
                            
                                How can I simplifiy this python iteration?
                            
                                How exactly does the behavior of Python bool and numpy bool_ differ?
                            
                                No legends Seaborn lineplot
                            
                                How to change result of type(object)?
                            
                                How to integrate Wikidata query in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With