Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy reshape on view

I'm confused about the results of numpy reshape operated on a view. In the following q.flags shows that it does not own the data, but q.base is neither x nor y, so what is it? I'm surprised to see that q.strides is 8 which means that it gets the next element by every time move 8 bytes in memory (if I understand correctly). However if none of the arrays other than x owns data, the only data buffer is from x, which does not permit getting the next element of q by moving 8 bytes.

In [99]: x = np.random.rand(4, 4)

In [100]: y = x.T

In [101]: q = y.reshape(16)

In [102]: q.base is y
Out[102]: False

In [103]: q.base is x
Out[103]: False

In [104]: y.flags
Out[104]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [105]: q.flags
Out[105]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [106]: q.strides
Out[106]: (8,)

In [107]: x
Out[107]: 
array([[ 0.62529694,  0.20813211,  0.73932923,  0.43183722],
       [ 0.09755023,  0.67082005,  0.78412615,  0.40307291],
       [ 0.2138691 ,  0.35191283,  0.57455781,  0.2449898 ],
       [ 0.36476299,  0.36590522,  0.24371933,  0.24837697]])

In [108]: q
Out[108]: 
array([ 0.62529694,  0.09755023,  0.2138691 ,  0.36476299,  0.20813211,
        0.67082005,  0.35191283,  0.36590522,  0.73932923,  0.78412615,
        0.57455781,  0.24371933,  0.43183722,  0.40307291,  0.2449898 ,
        0.24837697])

UPDATE:

It turns out that this question has been asked in the numpy discussion forum: http://numpy-discussion.10968.n7.nabble.com/OWNDATA-flag-and-reshape-views-vs-copies-td10363.html

like image 507
shaoyl85 Avatar asked Mar 05 '15 20:03

shaoyl85


People also ask

Why do we do reshape (- 1 1?

If you have an array of shape (2,4) then reshaping it with (-1, 1), then the array will get reshaped in such a way that the resulting array has only 1 column and this is only possible by having 8 rows, hence, (8,1).

Does NumPy reshape make a copy?

With a compatible order, reshape does not produce a copy.

What is the difference between reshape () and resize ()?

reshape() and numpy. resize() methods are used to change the size of a NumPy array. The difference between them is that the reshape() does not changes the original array but only returns the changed array, whereas the resize() method returns nothing and directly changes the original array.


2 Answers

In short: you cannot always rely on the ndarray.flags['OWNDATA'].

>>> import numpy as np
>>> x = np.random.rand(2,2)
>>> y = x.T
>>> q = y.reshape(4)
>>> y[0,0]
0.86751629121019136
>>> y[0,0] = 1
>>> q
array([ 0.86751629,  0.87671107,  0.65239976,  0.41761267])
>>> x
array([[ 1.        ,  0.65239976],
       [ 0.87671107,  0.41761267]])
>>> y
array([[ 1.        ,  0.87671107],
       [ 0.65239976,  0.41761267]])
>>> y.flags['OWNDATA']
False
>>> x.flags['OWNDATA']
True
>>> q.flags['OWNDATA']
False
>>> np.may_share_memory(x,y)
True
>>> np.may_share_memory(x,q)
False

Because q didn't reflect the change in the first element, like x or y, it must somehow be the owner of the data (somehow is explained below).

There is more discussion about the OWNDATA flag over at the numpy-discussion mailinglist. In the How can I tell if NumPy creates a view or a copy? question, it is briefly mentioned that simply checking the flags.owndata of an ndarray sometimes seems to fail and that it seems unreliable, as you mention. That's because every ndarray also has a base attribute:

the base of an ndarray is a reference to another array if the memory originated elsewhere (otherwise, the base is None). The operation y.reshape(4) creates a copy, not a view, because the strides of y are (8,16). To get it reshaped (C-contiguous) to (4,), the memory pointer would have to jump 0->16->8->24, which is not doable with a single stride. Thus q.base points to the memory location generated by the forced-copy-operation y.reshape, which has the same shape as y, but copied elements and thus has normal strides again: (16, 8). q.base is thus not bound to by any other name as it was the result of the forced-copy operation y.reshape(4). Only now can the object q.base be viewed in a (4,) shape, because the strides allow this. q is then indeed a view on q.base.

For most people it would be confusing to see that q.flags.owndata is False, because, as shown above, it is not a view on y. However, it is a view on a copy of y. That copy, q.base, is the owner of the data however. Thus the flags are actually correct, if you inspect closely.

like image 156
Oliver W. Avatar answered Nov 15 '22 13:11

Oliver W.


I like to use .__array_interface__.

In [811]: x.__array_interface__
Out[811]: 
{'data': (149194496, False),
 'descr': [('', '<f8')],
 'shape': (4, 4),
 'strides': None,
 'typestr': '<f8',
 'version': 3}

In [813]: y.__array_interface__
Out[813]: 
{'data': (149194496, False),
 'descr': [('', '<f8')],
 'shape': (4, 4),
 'strides': (8, 32),
 'typestr': '<f8',
 'version': 3}

In [814]: x.strides
Out[814]: (32, 8)
In [815]: y.strides
Out[815]: (8, 32)

Transpose was performed by reversing the strides. The base data pointer is the same.

In [817]: q.__array_interface__
Out[817]: 
{'data': (165219304, False),
 'descr': [('', '<f8')],
 'shape': (16,),
 'strides': None,
 'typestr': '<f8',
 'version': 3}

So the q data is a copy (different pointer). Strides (8,) means its elements are accessed by stepping from one f8 to the next. But a x.reshape(16) is a view of x - because its data can be accessed with a simple 8 step.

To access the original data in the q order, it would have to step 32 bytes 3 times (down x rows), then go back to the start and step 8 to the 2nd x column, followed by 3 row steps, etc. Since striding doesn't work this way, it has to work from a copy.

Note also that y[0,0] changes x[0,0], but q[0] is independent of both.

While OWNDATA for q is false, it is True for y.ravel() and y.flatten(). I suspect reshape() in this case is making a copy, and then reshaping, and it's the intermediate copy that 'owns' the data, q.base.

like image 30
hpaulj Avatar answered Nov 15 '22 12:11

hpaulj