Pandas seems to be missing a R-style matrix-level rolling window function (rollapply(..., by.column = FALSE)
), providing only the vector based version. Thus I tried to follow this question and it works beautifully with the example which can be replicated, but it doesn't work with pandas DataFrame
s even when using the (seemingly identical) underlying Numpy array.
Artificial problem replication:
import numpy as np
import pandas as pd
from numpy.lib.stride_tricks import as_strided
test = [[x * y for x in range(1, 10)] for y in [10**z for z in range(5)]]
mm = np.array(test, dtype = np.int64)
pp = pd.DataFrame(test).values
mm
and pp
look identical:
The numpy directly-derived matrix gives me what I want perfectly:
as_strided(mm, (mm.shape[0] - 3 + 1, 3, mm.shape[1]), (mm.shape[1] * 8, mm.shape[1] * 8, 8))
That is, it gives me 3 strides of 3 rows each, in a 3d matrix, allowing me to perform computations on a submatrix moving down by one row at a time.
But the pandas-derived version (identical call with mm
replaced by pp
):
as_strided(pp, (pp.shape[0] - 3 + 1, 3, pp.shape[1]), (pp.shape[1] * 8, pp.shape[1] * 8, 8))
is all weird like it's transposed somehow. Is this to do with column/row major order stuff?
I need to do matrix sliding windows in Pandas, and this seems my best shot, especially because it is really fast. What's going on here? How do I get the underlying Pandas array to behave like Numpy?
It seems that the .values
returns the underlying data in Fortran order (as you speculated):
>>> mm.flags # NumPy array
C_CONTIGUOUS : True
F_CONTIGUOUS : False
...
>>> pp.flags # array from DataFrame
C_CONTIGUOUS : False
F_CONTIGUOUS : True
...
This confuses as_strided
which expects the data to be arranged in C order in memory.
To fix things, you could copy the data in C order and use the same strides as in your question:
pp = pp.copy('C')
Alternatively, if you want to avoid copying large amounts of data, adjust the strides to acknowledge the column-order layout of the data:
as_strided(pp, (pp.shape[0] - 3 + 1, 3, pp.shape[1]), (8, 8, pp.shape[0]*8))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With