Will the results of numpy.lib.stride_tricks.as_strided
depend on the dtype of the NumPy array?
This question arises from the definition of .strides
, which is
Tuple of bytes to step in each dimension when traversing an array.
Take the following function that I've used in other questions here. It takes a 1d or 2d array and creates overlapping windows of length window
. The result will one dimension greater than the input.
def rwindows(a, window):
if a.ndim == 1:
a = a.reshape(-1, 1)
shape = a.shape[0] - window + 1, window, a.shape[-1]
strides = (a.strides[0],) + a.strides
windows = np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
return np.squeeze(windows)
# examples
# rwindows(np.arange(5), window=2)
# rwindows(np.arange(20).reshape((5,4)), window=2)
Because of the definition of strides and because, for instance, otherwise equivalent arrays of dtype float32
and float64
will have different strides, will this ever blow up my rwindows
function above?
I've tried to test but it's been in a non-exhaustive way and am looking for an answer that (1) explains whether the disclaimer/warning from the function doc has anything to do with what I'm asking here and (2) explains why or why not otherwise-equivalent arrays with different dtypes & strides would yield different results in the above.
No, the warning for as_strided
is for two issues not really related to the size of the data and more result from writing to the resulting view.
view = as_strided(a . . . )
only points to memory in a
. This is why there is so much deliberate preparation work done before calling as_strided
. If your algorithm is off, you can easily have your view
point to memory that is not in a
, and which may indeed be addressed to garbage, other variables, or your operating system. If you then write to that view, your data can be lost, misplaced, corrupted . . . or crash your computer.For your specific example, how safe it is depends a lot on what inputs you're using. You've set strides
with a.strides
so that is dynamic. You may want to assert
that the dtype
of a
isn't something weird like object
.
If you're sure that you will always have a 2-d a
that is larger than window
, you will probably be fine with your algorithm, but you can also assert
that to make sure. If not, you may want to make sure that the as_strided
output works for n-d a
arrays. For instance:
shape = a.shape[0] - window + 1, window, a.shape[-1]
Should probably be
shape = (a.shape[0] - window + 1, window) + a.shape[1:]
in order to accept n-d input. It would probably never be a problem as far as referencing bad memory, but the current shape
would reference the wrong data in a
if you had more dimensions.
view = foo
or bar( . . ., out = view)
), the results can be unpredictable and probably not what you expect.That said, if you are afraid of problems and don't need to write to the as_strided
view (as you don't for most convolution applications where it is commonly used), you can always set it as writable = False
, which will prevent both problems even if your strides
and/or shape
are incorrect.
EDIT: As pointed out by @hpaulj, in addition to those two problems, if you do something to a view
that makes a copy (like .flatten()
or fancy indexing a large chunk of it), it can cause a MemoryError
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With