Numpy View Reshape Without Copy (2d Moving/Sliding Window, Strides, Masked Memory Structures)

Tags:

I have an image stored as a 2d numpy array (possibly multi-d).

I can make a view onto that array that reflects a 2d sliding window, but when I reshape it so that each row is a flattened window (rows are windows, column is a pixel in that window) python makes a full copy. It does this because I'm using the typical stride trick, and the new shape isn't contiguous in memory.

I need this because I'm passing entire large images to an sklearn classifier, which accepts 2d matrices, where there's no batch/partial fit procedure, and the full expanded copy is far too large for memory.

My Question: Is there a way to do this without making a fully copy of the view?

I believe an answer will either be (1) something about strides or numpy memory management that I've overlooked, or (2) some kind of masked memory structure for python that can emulate a numpy array even to an external package like sklearn that includes cython.

This task of training over moving windows of a 2d image in memory is common, but the only attempt I know of to account for patches directly is the Vigra project (http://ukoethe.github.io/vigra/).

Thanks for the help.

>>> A=np.arange(9).reshape(3,3)
>>> print A
[[0 1 2]
 [3 4 5]
 [6 7 8]]
>>> xstep=1;ystep=1; xsize=2; ysize=2
>>> window_view = np.lib.stride_tricks.as_strided(A, ((A.shape[0] - xsize + 1) / xstep, (A.shape[1] - ysize + 1) / ystep, xsize, ysize),
...       (A.strides[0] * xstep, A.strides[1] * ystep, A.strides[0], A.strides[1]))
>>> print window_view 
[[[[0 1]
   [3 4]]

  [[1 2]
   [4 5]]]


 [[[3 4]
   [6 7]]

  [[4 5]
   [7 8]]]]
>>> 
>>> np.may_share_memory(A,window_view)
True
>>> B=window_view.reshape(-1,xsize*ysize)
>>> np.may_share_memory(A,B)
False

989

asked Jul 18 '14 02:07

locallyoptimal

1 Answers

Your task isn't possible using only strides, but NumPy does support one kind of array that does the job. With strides and masked_array you can create the desired view to your data. However, not all NumPy functions support operations with masked_array, so it is possible the scikit-learn doesn't do well with these either.

Let's first take a fresh look at what we are trying to do here. Consider the input data of your example. Fundamentally the data is just a 1-d array in the memory, and it is simpler if we think about the strides with that. The array only appears to be 2-d, because we have defined its shape. Using strides, the shape could be defined like this:

from numpy.lib.stride_tricks import as_strided

base = np.arange(9)
isize = base.itemsize
A = as_strided(base, shape=(3, 3), strides=(3 * isize, isize))

Now the goal is to set such strides to base that it orders the numbers like in the end array, B. In other words, we are asking for integers a and b such that

>>> as_strided(base, shape=(4, 4), strides=(a, b))
array([[0, 1, 3, 4],
       [1, 2, 4, 5],
       [3, 4, 6, 7],
       [4, 5, 7, 8]])

But this is clearly impossible. The closest view we can achieve like this is with a rolling window over base:

>>> C = as_strided(base, shape=(5, 5), strides=(isize, isize))
>>> C
array([[0, 1, 2, 3, 4],
       [1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [3, 4, 5, 6, 7],
       [4, 5, 6, 7, 8]])

But the difference here is that we have extra columns and rows, which we would like to get rid of. So, effectively we are asking for a rolling window which is not contiguous and also makes jumps at regular intervals. With this example we want to have every third item excluded from the window and jump over one item after two rows.

We can describe this as a masked_array:

>>> mask = np.zeros((5, 5), dtype=bool)
>>> mask[2, :] = True
>>> mask[:, 2] = True
>>> D = np.ma.masked_array(C, mask=mask)

This array contains exactly the data that we want, and it is only a view to the original data. We can confirm that the data is equal

>>> D.data[~D.mask].reshape(4, 4)
array([[0, 1, 3, 4],
       [1, 2, 4, 5],
       [3, 4, 6, 7],
       [4, 5, 7, 8]])

But as I said in the beginning, it is quite likely that scikit-learn doesn't understand masked arrays. If it simply converts this to an array, the data will be wrong:

>>> np.array(D)
array([[0, 1, 2, 3, 4],
       [1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [3, 4, 5, 6, 7],
       [4, 5, 6, 7, 8]])

105

answered Nov 15 '22 08:11

jasaarim

Related questions
                            
                                python pdb lambda function global name error
                            
                                How to avoid Pylint warnings for constructor of inherited class in Python 3?
                            
                                Cannot import google cloud endpoints client library class in Android project
                            
                                postgres - cannot drop database using psycopg2
                            
                                How to convert JSON string to Avro in Python?
                            
                                Python sys.stderr flush frequency
                            
                                Fast algorithm to compute Adamic-Adar
                            
                                Multiprocessing : NULL result without error in PyObject_Call
                            
                                Why is numpy.random.binomial(1, nan) = -9223372036854775807?
                            
                                Different behaviour of hexbin and histogram2d
                            
                                Using django-dynamic-formset with CreateWithInlinesView from django-extra-views - multiple formsets
                            
                                Is there way to check feature deprecation against django version?
                            
                                Django, ajax populate form with model data
                            
                                Pandas to D3. Serializing dataframes to JSON
                            
                                python / django - bidi brackets issue in html select list
                            
                                Can I make Django QueryDict preserve ordering?
                            
                                Is there a simple way to add a border to Kivy Labels, Buttons, Widgets etc. with-out images?
                            
                                How to write utf8 to standard output in a way that works with python2 and python3
                            
                                Python PIL: Blend transparent image onto another
                            
                                Python Flask get json data to display

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Numpy View Reshape Without Copy (2d Moving/Sliding Window, Strides, Masked Memory Structures)

Tags:

python

image

numpy

scikit-learn

scikit-image

locallyoptimal

People also ask

1 Answers

jasaarim

Recent Activity

Donate For Us