I have the following data frame <code>C</code>. <pre class="prettyprint"><code>>>> C a b c 2011-01-01 0 0 NaN 2011-01-02 41 12 NaN 2011-01-03 82 24 NaN 2011-01-04 123 36 NaN 2011-01-05 164 48 NaN 2011-01-06 205 60 2 2011-01-07 246 72 4 2011-01-08 287 84 6 2011-01-09 328 96 8 2011-01-10 369 108 10 </code></pre> I would like to add a new column, <code>d</code>, where I apply a rolling function, on a fixed window (6 here), where I somehow, for each row (or date), fix the value <code>c</code>. One loop in this rolling function should be (pseudo): <pre class="prettyprint"><code> a b c d 2011-01-01 0 0 NaN a + b*2 (a,b from this row, '2' is from 'c' on 2011-01-06) 2011-01-02 41 12 NaN a + b*2 (a,b from this row, '2' is still from 2011-01-06) 2011-01-03 82 24 NaN a + b*2 2011-01-04 123 36 NaN a + b*2 2011-01-05 164 48 NaN a + b*2 2011-01-06 205 60 2 a + b*2 2011-01-07 246 72 4 2011-01-08 287 84 6 2011-01-09 328 96 8 2011-01-10 369 108 10 </code></pre> After this "loop" I want to take all of these 6 calculated rows in <code>d</code> and run a function call, which in turn will return one value, that should be stored in another column, <code>e</code> say: <pre class="prettyprint"><code> a b c d e 2011-01-01 0 0 NaN a + b*2 ---| NaN 2011-01-02 41 12 NaN a + b*2 | NaN 2011-01-03 82 24 NaN a + b*2 | These values NaN 2011-01-04 123 36 NaN a + b*2 | are input to NaN 2011-01-05 164 48 NaN a + b*2 | function NaN 2011-01-06 205 60 2 a + b*2 ---| yielding X 2011-01-07 246 72 4 value X in 2011-01-08 287 84 6 column 'e' 2011-01-09 328 96 8 2011-01-10 369 108 10 </code></pre> This procedure would then be iterated onto the next window (again 6 long) like: <pre class="prettyprint"><code> a b c d e 2011-01-01 0 0 NaN 2011-01-02 41 12 NaN a + b*4 (a,b from this row, '4' is from 'c' now from 2011-01-07) 2011-01-03 82 24 NaN a + b*4 (a,b from this row, '4' is still from 2011-01-07) 2011-01-04 123 36 NaN a + b*4 2011-01-05 164 48 NaN a + b*4 2011-01-06 205 60 2 a + b*4 X 2011-01-07 246 72 4 a + b*4 2011-01-08 287 84 6 2011-01-09 328 96 8 2011-01-10 369 108 10 a b c d e 2011-01-01 0 0 NaN NaN 2011-01-02 41 12 NaN a + b*4 ---| NaN 2011-01-03 82 24 NaN a + b*4 | These values NaN 2011-01-04 123 36 NaN a + b*4 | are input to NaN 2011-01-05 164 48 NaN a + b*4 | function NaN 2011-01-06 205 60 2 a + b*4 | yielding X 2011-01-07 246 72 4 a + b*4 ---| value Y in Y 2011-01-08 287 84 6 column 'e' 2011-01-09 328 96 8 2011-01-10 369 108 10 </code></pre> Hopefully this is clear enough, Thanks, N

You could use <code>pd.rolling_apply</code>: <pre class="prettyprint"><code>import numpy as np import pandas as pd df = pd.read_table('data', sep='\s+') def foo(x, df): window = df.iloc[x] # print(window) c = df.ix[int(x[-1]), 'c'] dvals = window['a'] + window['b']*c return bar(dvals) def bar(dvals): # print(dvals) return dvals.mean() df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo, args=(df,)) print(df) </code></pre> yields <pre class="prettyprint"><code> a b c e 2011-01-01 0 0 NaN NaN 2011-01-02 41 12 NaN NaN 2011-01-03 82 24 NaN NaN 2011-01-04 123 36 NaN NaN 2011-01-05 164 48 NaN NaN 2011-01-06 205 60 2 162.5 2011-01-07 246 72 4 311.5 2011-01-08 287 84 6 508.5 2011-01-09 328 96 8 753.5 2011-01-10 369 108 10 1046.5 </code></pre> <hr> The <code>args</code> and <code>kwargs</code> parameters were added to <code>rolling_apply</code> in Pandas version 0.14.0. Since in my example above <code>df</code> is a global variable, it is not really necessary to pass it to <code>foo</code> as an argument. You could simply remove <code>df</code> from the <code>def foo</code> line and also omit the <code>args=(df,)</code> in the call to <code>rolling_apply</code>. However, there are times when <code>df</code> might not be defined in a scope accessible by <code>foo</code>. In that case, there is a simple workaround -- make a closure: <pre class="prettyprint"><code>def foo(df): def inner_foo(x): window = df.iloc[x] # print(window) c = df.ix[int(x[-1]), 'c'] dvals = window['a'] + window['b']*c return bar(dvals) return inner_foo df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo(df)) </code></pre>

Rolling a function on a data frame

Tags:

python

pandas

dataframe

apply

I have the following data frame C.

>>> C
              a    b   c
2011-01-01    0    0 NaN
2011-01-02   41   12 NaN
2011-01-03   82   24 NaN
2011-01-04  123   36 NaN
2011-01-05  164   48 NaN
2011-01-06  205   60   2
2011-01-07  246   72   4
2011-01-08  287   84   6
2011-01-09  328   96   8
2011-01-10  369  108  10

I would like to add a new column, d, where I apply a rolling function, on a fixed window (6 here), where I somehow, for each row (or date), fix the value c. One loop in this rolling function should be (pseudo):

              a    b   c   d
2011-01-01    0    0 NaN   a + b*2 (a,b from this row, '2' is from 'c' on 2011-01-06)
2011-01-02   41   12 NaN   a + b*2 (a,b from this row, '2' is still from 2011-01-06)
2011-01-03   82   24 NaN   a + b*2
2011-01-04  123   36 NaN   a + b*2
2011-01-05  164   48 NaN   a + b*2
2011-01-06  205   60   2   a + b*2
2011-01-07  246   72   4   
2011-01-08  287   84   6   
2011-01-09  328   96   8   
2011-01-10  369  108  10

After this "loop" I want to take all of these 6 calculated rows in d and run a function call, which in turn will return one value, that should be stored in another column, e say:

              a    b   c   d                               e

2011-01-01    0    0 NaN   a + b*2 ---|                   NaN
2011-01-02   41   12 NaN   a + b*2    |                   NaN
2011-01-03   82   24 NaN   a + b*2    | These values      NaN
2011-01-04  123   36 NaN   a + b*2    | are input to      NaN
2011-01-05  164   48 NaN   a + b*2    | function          NaN
2011-01-06  205   60   2   a + b*2 ---| yielding          X
2011-01-07  246   72   4                value X in
2011-01-08  287   84   6                column 'e'
2011-01-09  328   96   8   
2011-01-10  369  108  10

This procedure would then be iterated onto the next window (again 6 long) like:

              a    b   c   d             e
2011-01-01    0    0 NaN   
2011-01-02   41   12 NaN   a + b*4 (a,b from this row, '4' is from 'c' now from 2011-01-07)
2011-01-03   82   24 NaN   a + b*4 (a,b from this row, '4' is still from 2011-01-07)
2011-01-04  123   36 NaN   a + b*4
2011-01-05  164   48 NaN   a + b*4
2011-01-06  205   60   2   a + b*4       X
2011-01-07  246   72   4   a + b*4
2011-01-08  287   84   6   
2011-01-09  328   96   8   
2011-01-10  369  108  10

              a    b   c   d                               e

2011-01-01    0    0 NaN                                  NaN
2011-01-02   41   12 NaN   a + b*4 ---|                   NaN
2011-01-03   82   24 NaN   a + b*4    | These values      NaN
2011-01-04  123   36 NaN   a + b*4    | are input to      NaN
2011-01-05  164   48 NaN   a + b*4    | function          NaN
2011-01-06  205   60   2   a + b*4    | yielding          X
2011-01-07  246   72   4   a + b*4 ---| value Y in        Y
2011-01-08  287   84   6                column 'e'
2011-01-09  328   96   8   
2011-01-10  369  108  10

Hopefully this is clear enough,

Thanks, N

788

asked Jan 28 '15 10:01

gussilago

1 Answers

You could use pd.rolling_apply:

import numpy as np
import pandas as pd
df = pd.read_table('data', sep='\s+')

def foo(x, df):
    window = df.iloc[x]
    # print(window)
    c = df.ix[int(x[-1]), 'c']
    dvals = window['a'] + window['b']*c
    return bar(dvals)

def bar(dvals):
    # print(dvals)
    return dvals.mean()

df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo, args=(df,))
print(df)

yields

              a    b   c       e
2011-01-01    0    0 NaN     NaN
2011-01-02   41   12 NaN     NaN
2011-01-03   82   24 NaN     NaN
2011-01-04  123   36 NaN     NaN
2011-01-05  164   48 NaN     NaN
2011-01-06  205   60   2   162.5
2011-01-07  246   72   4   311.5
2011-01-08  287   84   6   508.5
2011-01-09  328   96   8   753.5
2011-01-10  369  108  10  1046.5

The args and kwargs parameters were added to rolling_apply in Pandas version 0.14.0.

Since in my example above df is a global variable, it is not really necessary to pass it to foo as an argument. You could simply remove df from the def foo line and also omit the args=(df,) in the call to rolling_apply.

However, there are times when df might not be defined in a scope accessible by foo. In that case, there is a simple workaround -- make a closure:

def foo(df):
    def inner_foo(x):
        window = df.iloc[x]
        # print(window)
        c = df.ix[int(x[-1]), 'c']
        dvals = window['a'] + window['b']*c
        return bar(dvals)
    return inner_foo

df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo(df))

174

answered Oct 12 '22 07:10

unutbu

Related questions
                            
                                Largest weakly connected component in networkX
                            
                                Check in Python if URL exists
                            
                                Python Kivy: Properly start a background process that updates GUI elements
                            
                                I'm having troubles getting pybrain working through anaconda
                            
                                IPython notebook read string from raw text cell
                            
                                When plotting datetime index data, put markers in the plot on specific days (e.g. weekend)
                            
                                Truly non-blocking HTTPS Server in Python
                            
                                TypeError: 'unicode' object is not callable
                            
                                DockerDaemonConnectionError when setting Google Cloud Managed VM in Ubuntu
                            
                                How to get a fast lambda function from an sympy expression in 3 dimensions?
                            
                                ipython notebook on linux VM running matplotlib interactive with nbagg
                            
                                Asyncio detecting disconnect hangs
                            
                                Get post data from ajax post request in python file
                            
                                Xpath to get information in next sibling tag using Scrapy
                            
                                Can only iterate once through csv reader
                            
                                scrapy: exceptions.AttributeError: 'unicode' object has no attribute 'dont_filter'
                            
                                How to add a list under a dictionary?
                            
                                Inline conditional between more than 2 values
                            
                                calling a function saved in a class attribute: different behavior with built-in function vs. normal function
                            
                                how to python mimetypes.guess_type from a file-like object

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With