I want a dataframe representation of of a rolling window. Instead of performing some operation on a rolling window, I want a dataframe where the window is represented in another dimension. This could be as a pd.Panel
or np.array
or a pd.DataFrame
with a pd.MultiIndex
.
import pandas as pd
import numpy as np
np.random.seed([3,1415])
df = pd.DataFrame(np.random.rand(10, 3).round(2),
columns=['A', 'B', 'C'],
index=list('abcdefghij'))
print df
A B C
a 0.44 0.41 0.46
b 0.47 0.46 0.02
c 0.85 0.82 0.78
d 0.76 0.93 0.83
e 0.88 0.93 0.72
f 0.12 0.15 0.20
g 0.44 0.10 0.28
h 0.61 0.09 0.84
i 0.74 0.87 0.69
j 0.38 0.23 0.44
For a window = 2
I'd expect the result to be.
0 1
A B C A B C
a 0.44 0.41 0.46 0.47 0.46 0.02
b 0.47 0.46 0.02 0.85 0.82 0.78
c 0.85 0.82 0.78 0.76 0.93 0.83
d 0.76 0.93 0.83 0.88 0.93 0.72
e 0.88 0.93 0.72 0.12 0.15 0.20
f 0.12 0.15 0.20 0.44 0.10 0.28
g 0.44 0.10 0.28 0.61 0.09 0.84
h 0.61 0.09 0.84 0.74 0.87 0.69
i 0.74 0.87 0.69 0.38 0.23 0.44
I'm not determined to have the layout presented this way, but this is the information I want. I'm looking for the most efficient way to get at this.
I've experimented with using shift
in varying ways but it feels clunky. This is what I use to produce the output above:
print pd.concat([df, df.shift(-1)], axis=1, keys=[0, 1]).dropna()
We could use NumPy to get views into those sliding windows with its esoteric strided tricks
. If you are using this new dimension for some reduction like matrix-multiplication, this would be ideal. If for some reason, you want to have a 2D
output, we need to use a reshape at the end, which will result in creating a copy though.
Thus, the implementation would look something like this -
from numpy.lib.stride_tricks import as_strided as strided
def get_sliding_window(df, W, return2D=0):
a = df.values
s0,s1 = a.strides
m,n = a.shape
out = strided(a,shape=(m-W+1,W,n),strides=(s0,s0,s1))
if return2D==1:
return out.reshape(a.shape[0]-W+1,-1)
else:
return out
Sample run for 2D/3D output -
In [68]: df
Out[68]:
A B
0 0.44 0.41
1 0.46 0.47
2 0.46 0.02
3 0.85 0.82
4 0.78 0.76
In [70]: get_sliding_window(df, 3,return2D=1)
Out[70]:
array([[ 0.44, 0.41, 0.46, 0.47, 0.46, 0.02],
[ 0.46, 0.47, 0.46, 0.02, 0.85, 0.82],
[ 0.46, 0.02, 0.85, 0.82, 0.78, 0.76]])
Here's how the 3D views output would look like -
In [69]: get_sliding_window(df, 3,return2D=0)
Out[69]:
array([[[ 0.44, 0.41],
[ 0.46, 0.47],
[ 0.46, 0.02]],
[[ 0.46, 0.47],
[ 0.46, 0.02],
[ 0.85, 0.82]],
[[ 0.46, 0.02],
[ 0.85, 0.82],
[ 0.78, 0.76]]])
Let's time it for views 3D
output for various window sizes -
In [331]: df = pd.DataFrame(np.random.rand(1000, 3).round(2))
In [332]: %timeit get_3d_shfted_array(df,2) # @Yakym Pirozhenko's soln
10000 loops, best of 3: 47.9 µs per loop
In [333]: %timeit get_sliding_window(df,2)
10000 loops, best of 3: 39.2 µs per loop
In [334]: %timeit get_3d_shfted_array(df,5) # @Yakym Pirozhenko's soln
10000 loops, best of 3: 89.9 µs per loop
In [335]: %timeit get_sliding_window(df,5)
10000 loops, best of 3: 39.4 µs per loop
In [336]: %timeit get_3d_shfted_array(df,15) # @Yakym Pirozhenko's soln
1000 loops, best of 3: 258 µs per loop
In [337]: %timeit get_sliding_window(df,15)
10000 loops, best of 3: 38.8 µs per loop
Let's verify that we are indeed getting views -
In [338]: np.may_share_memory(get_sliding_window(df,2), df.values)
Out[338]: True
The almost constant timings with get_sliding_window
even across various window sizes suggest the huge benefit of getting the view instead of copying.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With