Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas rolling window multiple columns [duplicate]

Tags:

python

pandas

I've got a dataset:

    Open     High      Low    Close        
0  132.960  133.340  132.940  133.105
1  133.110  133.255  132.710  132.755
2  132.755  132.985  132.640  132.735 
3  132.730  132.790  132.575  132.685
4  132.685  132.785  132.625  132.755

I try to use rolling.apply function for all rows, like this:

df['new_col']= df[['Open']].rolling(2).apply(AccumulativeSwingIndex(df['High'],df['Low'],df['Close']))
  • shows error

or

df['new_col']=  df[['Open', 'High', 'Low', 'Close']].rolling(2).apply(AccumulativeSwingIndex)
  • pass only parameter from column 'Open'

Can anybody help me?

like image 534
quarkpol Avatar asked Aug 10 '16 16:08

quarkpol


4 Answers

Define your own roll

We can create a function that takes a window size argument w and any other keyword arguments. We use this to build a new DataFrame in which we will call groupby on while passing on the keyword arguments via kwargs.

Note: I didn't have to use stride_tricks.as_strided but it is succinct and in my opinion appropriate.
from numpy.lib.stride_tricks import as_strided as stride
import pandas as pd

def roll(df, w, **kwargs):
    v = df.values
    d0, d1 = v.shape
    s0, s1 = v.strides

    a = stride(v, (d0 - (w - 1), w, d1), (s0, s0, s1))

    rolled_df = pd.concat({
        row: pd.DataFrame(values, columns=df.columns)
        for row, values in zip(df.index, a)
    })

    return rolled_df.groupby(level=0, **kwargs)

roll(df, 2).mean()

       Open      High       Low    Close
0  133.0350  133.2975  132.8250  132.930
1  132.9325  133.1200  132.6750  132.745
2  132.7425  132.8875  132.6075  132.710
3  132.7075  132.7875  132.6000  132.720

We can also use the pandas.DataFrame.pipe method to the same effect:

df.pipe(roll, w=2).mean()


OLD ANSWER

Panel has been deprecated. See above for updated answer.

see https://stackoverflow.com/a/37491779/2336654

define our own roll

def roll(df, w, **kwargs):
    roll_array = np.dstack([df.values[i:i+w, :] for i in range(len(df.index) - w + 1)]).T
    panel = pd.Panel(roll_array, 
                     items=df.index[w-1:],
                     major_axis=df.columns,
                     minor_axis=pd.Index(range(w), name='roll'))
    return panel.to_frame().unstack().T.groupby(level=0, **kwargs)

you should be able to:

roll(df, 2).apply(your_function)

Using mean

roll(df, 2).mean()

major      Open      High       Low    Close
1      133.0350  133.2975  132.8250  132.930
2      132.9325  133.1200  132.6750  132.745
3      132.7425  132.8875  132.6075  132.710
4      132.7075  132.7875  132.6000  132.720

f = lambda df: df.sum(1)

roll(df, 2, group_keys=False).apply(f)

   roll
1  0       532.345
   1       531.830
2  0       531.830
   1       531.115
3  0       531.115
   1       530.780
4  0       530.780
   1       530.850
dtype: float64
like image 193
piRSquared Avatar answered Nov 05 '22 23:11

piRSquared


As your rolling window is not too large, I think you can also put them in the same dataframe then use the apply function to reduce.

For example, with the dataset df as following

            Open    High        Low     Close
Date                
2017-11-07  258.97  259.3500    258.09  258.67
2017-11-08  258.47  259.2200    258.15  259.11
2017-11-09  257.73  258.3900    256.36  258.17
2017-11-10  257.73  258.2926    257.37  258.09
2017-11-13  257.31  258.5900    257.27  258.33

You can just add the rolling data to this dataframe with

window = 2
df1 = pd.DataFrame(index=df.index)
for i in range(window):
    df_shifted = df.shift(i).copy()
    df_shifted.columns = ["{}-{}".format(s, i) for s in df.columns]
    df1 = df1.join(df_shifted)
df1

           Open-0   High-0      Low-0   Close-0 Open-1  High-1      Low-1   Close-1
Date                                
2017-11-07  258.97  259.3500    258.09  258.67  NaN     NaN         NaN     NaN
2017-11-08  258.47  259.2200    258.15  259.11  258.97  259.3500    258.09  258.67
2017-11-09  257.73  258.3900    256.36  258.17  258.47  259.2200    258.15  259.11
2017-11-10  257.73  258.2926    257.37  258.09  257.73  258.3900    256.36  258.17
2017-11-13  257.31  258.5900    257.27  258.33  257.73  258.2926    257.37  258.09

Then you can make an apply on it easily with all the rolling data you want with

df1.apply(AccumulativeSwingIndex, axis=1)
like image 43
aliciawyy Avatar answered Nov 06 '22 00:11

aliciawyy


Here's a workaround I came up with:

df['new_col'] = list(map(fn, df.rolling(2)))
like image 32
Leonardo Avatar answered Nov 05 '22 23:11

Leonardo


I also encountered some problems alike. the following lines may help you out. this might be the simplest solution for retrieving the data(matrices) within dataframe.rolling(), after which we can do almost anything with it. As comparison, d.rolling().apply() only allows aggregation functions.

size = 20
matrices = [x.values for x in d.rolling(size)][size-1:]
len(matrices)
[do_anything(i) for i in matrices]
like image 1
Yin Tang Avatar answered Nov 05 '22 23:11

Yin Tang