Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will results of numpy.as_strided depend on input dtype?

Will the results of numpy.lib.stride_tricks.as_strided depend on the dtype of the NumPy array?

This question arises from the definition of .strides, which is

Tuple of bytes to step in each dimension when traversing an array.

Take the following function that I've used in other questions here. It takes a 1d or 2d array and creates overlapping windows of length window. The result will one dimension greater than the input.

def rwindows(a, window):
    if a.ndim == 1:
        a = a.reshape(-1, 1)
    shape = a.shape[0] - window + 1, window, a.shape[-1]
    strides = (a.strides[0],) + a.strides
    windows = np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
    return np.squeeze(windows)

# examples
# rwindows(np.arange(5), window=2)
# rwindows(np.arange(20).reshape((5,4)), window=2)

Because of the definition of strides and because, for instance, otherwise equivalent arrays of dtype float32 and float64 will have different strides, will this ever blow up my rwindows function above?

I've tried to test but it's been in a non-exhaustive way and am looking for an answer that (1) explains whether the disclaimer/warning from the function doc has anything to do with what I'm asking here and (2) explains why or why not otherwise-equivalent arrays with different dtypes & strides would yield different results in the above.

like image 250
Brad Solomon Avatar asked Aug 15 '17 02:08

Brad Solomon


1 Answers

No, the warning for as_strided is for two issues not really related to the size of the data and more result from writing to the resulting view.

  1. First, there is no protection to assure view = as_strided(a . . . ) only points to memory in a. This is why there is so much deliberate preparation work done before calling as_strided. If your algorithm is off, you can easily have your view point to memory that is not in a, and which may indeed be addressed to garbage, other variables, or your operating system. If you then write to that view, your data can be lost, misplaced, corrupted . . . or crash your computer.

For your specific example, how safe it is depends a lot on what inputs you're using. You've set strides with a.strides so that is dynamic. You may want to assert that the dtype of a isn't something weird like object.

If you're sure that you will always have a 2-d a that is larger than window, you will probably be fine with your algorithm, but you can also assert that to make sure. If not, you may want to make sure that the as_strided output works for n-d a arrays. For instance:

shape = a.shape[0] - window + 1, window, a.shape[-1]

Should probably be

shape = (a.shape[0] - window + 1, window) + a.shape[1:]

in order to accept n-d input. It would probably never be a problem as far as referencing bad memory, but the current shape would reference the wrong data in a if you had more dimensions.

  1. Second, the view created references the same data blocks multiple times. If you then do a parallel write to that view (through view = foo or bar( . . ., out = view)), the results can be unpredictable and probably not what you expect.

That said, if you are afraid of problems and don't need to write to the as_strided view (as you don't for most convolution applications where it is commonly used), you can always set it as writable = False, which will prevent both problems even if your strides and/or shape are incorrect.

EDIT: As pointed out by @hpaulj, in addition to those two problems, if you do something to a view that makes a copy (like .flatten() or fancy indexing a large chunk of it), it can cause a MemoryError.

like image 51
Daniel F Avatar answered Nov 12 '22 02:11

Daniel F