Following on from this question Python custom function using rolling_apply for pandas, about using <code>rolling_apply</code>. Although I have progressed with my function, I am struggling to deal with a function that requires two or more columns as inputs: Creating the same setup as before <pre class="prettyprint"><code>import pandas as pd import numpy as np import random tmp = pd.DataFrame(np.random.randn(2000,2)/10000, index=pd.date_range('2001-01-01',periods=2000), columns=['A','B']) </code></pre> But changing the function slightly to take two columns. <pre class="prettyprint"><code>def gm(df,p): df = pd.DataFrame(df) v =((((df['A']+df['B'])+1).cumprod())-1)*p return v.iloc[-1] </code></pre> It produces the following error: <pre class="prettyprint"><code>pd.rolling_apply(tmp,50,lambda x: gm(x,5)) KeyError: u'no item named A' </code></pre> I think it is because the input to the lambda function is an ndarray of length 50 and only of the first column, and doesn't take two columns as the input. Is there a way to get both columns as inputs and use it in a <code>rolling_apply</code> function. Again any help would be greatly appreciated...

Not sure if still relevant here, with the new <code>rolling</code> classes on pandas, whenever we pass <code>raw=False</code> to <code>apply</code>, we are actually passing the series to the wraper, which means we have access to the index of each observation, and can use that to further handle multiple columns. From the docs: <blockquote> raw : bool, default None <blockquote> False : passes each row or column as a Series to the function. </blockquote> </blockquote> <blockquote> <blockquote> True or None : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance. </blockquote> </blockquote> In this scenario, we can do the following: <pre class="prettyprint lang-py prettyprint-override"><code>### create a func for multiple columns def cust_func(s): val_for_col2 = df.loc[s.index, col2] #.values val_for_col3 = df.loc[s.index, col3] #.values val_for_col4 = df.loc[s.index, col4] #.values ## apply over multiple column values return np.max(s) *np.min(val_for_col2)*np.max(val_for_col3)*np.mean(val_for_col4) ### Apply to the dataframe df.rolling('10s')['col1'].apply(cust_func, raw=False) </code></pre> Note that here we can still use all functionalities from <code>pandas rolling</code> class, which is particularly useful when dealing with time-related windows. The fact that we are passing one column and using the entire dataframe feels like a hack, but it works in practice.

Looks like rolling_apply will try to convert input of user func into ndarray (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.stats.moments.rolling_apply.html?highlight=rolling_apply#pandas.stats.moments.rolling_apply). Workaround based on using aux column ii which is used to select window inside of manipulating function gm: <pre class="prettyprint"><code>import pandas as pd import numpy as np import random tmp = pd.DataFrame(np.random.randn(2000,2)/10000, columns=['A','B']) tmp['date'] = pd.date_range('2001-01-01',periods=2000) tmp['ii'] = range(len(tmp)) def gm(ii, df, p): x_df = df.iloc[map(int, ii)] #print x_df v =((((x_df['A']+x_df['B'])+1).cumprod())-1)*p #print v return v.iloc[-1] #print tmp.head() res = pd.rolling_apply(tmp.ii, 50, lambda x: gm(x, tmp, 5)) print res </code></pre>

Python pandas rolling_apply two column input into function

Tags:

Following on from this question Python custom function using rolling_apply for pandas, about using rolling_apply. Although I have progressed with my function, I am struggling to deal with a function that requires two or more columns as inputs:

Creating the same setup as before

import pandas as pd
import numpy as np
import random

tmp  = pd.DataFrame(np.random.randn(2000,2)/10000, 
                    index=pd.date_range('2001-01-01',periods=2000),
                    columns=['A','B'])

But changing the function slightly to take two columns.

def gm(df,p):
    df = pd.DataFrame(df)
    v =((((df['A']+df['B'])+1).cumprod())-1)*p
    return v.iloc[-1]

It produces the following error:

pd.rolling_apply(tmp,50,lambda x: gm(x,5))

  KeyError: u'no item named A'

I think it is because the input to the lambda function is an ndarray of length 50 and only of the first column, and doesn't take two columns as the input. Is there a way to get both columns as inputs and use it in a rolling_apply function.

Again any help would be greatly appreciated...

887

asked Jan 10 '14 09:01

h.l.m

2 Answers

Not sure if still relevant here, with the new rolling classes on pandas, whenever we pass raw=False to apply, we are actually passing the series to the wraper, which means we have access to the index of each observation, and can use that to further handle multiple columns.

From the docs:

raw : bool, default None

False : passes each row or column as a Series to the function.

True or None : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.

In this scenario, we can do the following:

### create a func for multiple columns
def cust_func(s):

    val_for_col2 = df.loc[s.index, col2] #.values
    val_for_col3 = df.loc[s.index, col3] #.values
    val_for_col4 = df.loc[s.index, col4] #.values
    
    ## apply over multiple column values
    return np.max(s) *np.min(val_for_col2)*np.max(val_for_col3)*np.mean(val_for_col4)
    

### Apply to the dataframe
df.rolling('10s')['col1'].apply(cust_func, raw=False)

Note that here we can still use all functionalities from pandas rolling class, which is particularly useful when dealing with time-related windows.

The fact that we are passing one column and using the entire dataframe feels like a hack, but it works in practice.

194

answered Oct 25 '22 01:10

calestini

Looks like rolling_apply will try to convert input of user func into ndarray (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.stats.moments.rolling_apply.html?highlight=rolling_apply#pandas.stats.moments.rolling_apply).

Workaround based on using aux column ii which is used to select window inside of manipulating function gm:

import pandas as pd
import numpy as np
import random

tmp = pd.DataFrame(np.random.randn(2000,2)/10000, columns=['A','B'])
tmp['date'] = pd.date_range('2001-01-01',periods=2000)
tmp['ii'] = range(len(tmp))            

def gm(ii, df, p):
    x_df = df.iloc[map(int, ii)]
    #print x_df
    v =((((x_df['A']+x_df['B'])+1).cumprod())-1)*p
    #print v
    return v.iloc[-1]

#print tmp.head()
res = pd.rolling_apply(tmp.ii, 50, lambda x: gm(x, tmp, 5))
print res

answered Oct 25 '22 01:10

lowtech

Related questions
                            
                                What are template classes in Spring Java? Why are they called templates? For example jdbc-template, jms-template etc
                            
                                Amazon SNS: How to get EndpointArn by token(registrationId) using amazon .net sdk?
                            
                                How does auto boxing/unboxing work in Java?
                            
                                Difference between Kafka and ActiveMQ [closed]
                            
                                What does it mean to link against something?
                            
                                How can i link my c++ program statically with libstdc++ on osx using clang?
                            
                                How to get all addresses and masks from local interfaces in go?
                            
                                Add a NuGet reference in a Windows Universal Shared Project [closed]
                            
                                What is the difference between rdoc and md?
                            
                                PowerShell Invoke-WebRequest, how to automatically use original file name?
                            
                                Can Shadow DOM elements inherit CSS?
                            
                                ChromeWebDriver - unknown error: Chrome failed to start: crashed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With