I'm confused about the results of numpy reshape operated on a view. In the following q.flags shows that it does not own the data, but q.base is neither x nor y, so what is it? I'm surprised to see that q.strides is 8 which means that it gets the next element by every time move 8 bytes in memory (if I understand correctly). However if none of the arrays other than x owns data, the only data buffer is from x, which does not permit getting the next element of q by moving 8 bytes.
In [99]: x = np.random.rand(4, 4)
In [100]: y = x.T
In [101]: q = y.reshape(16)
In [102]: q.base is y
Out[102]: False
In [103]: q.base is x
Out[103]: False
In [104]: y.flags
Out[104]:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
In [105]: q.flags
Out[105]:
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
In [106]: q.strides
Out[106]: (8,)
In [107]: x
Out[107]:
array([[ 0.62529694, 0.20813211, 0.73932923, 0.43183722],
[ 0.09755023, 0.67082005, 0.78412615, 0.40307291],
[ 0.2138691 , 0.35191283, 0.57455781, 0.2449898 ],
[ 0.36476299, 0.36590522, 0.24371933, 0.24837697]])
In [108]: q
Out[108]:
array([ 0.62529694, 0.09755023, 0.2138691 , 0.36476299, 0.20813211,
0.67082005, 0.35191283, 0.36590522, 0.73932923, 0.78412615,
0.57455781, 0.24371933, 0.43183722, 0.40307291, 0.2449898 ,
0.24837697])
UPDATE:
It turns out that this question has been asked in the numpy discussion forum: http://numpy-discussion.10968.n7.nabble.com/OWNDATA-flag-and-reshape-views-vs-copies-td10363.html
If you have an array of shape (2,4) then reshaping it with (-1, 1), then the array will get reshaped in such a way that the resulting array has only 1 column and this is only possible by having 8 rows, hence, (8,1).
With a compatible order, reshape does not produce a copy.
reshape() and numpy. resize() methods are used to change the size of a NumPy array. The difference between them is that the reshape() does not changes the original array but only returns the changed array, whereas the resize() method returns nothing and directly changes the original array.
In short: you cannot always rely on the ndarray.flags['OWNDATA']
.
>>> import numpy as np
>>> x = np.random.rand(2,2)
>>> y = x.T
>>> q = y.reshape(4)
>>> y[0,0]
0.86751629121019136
>>> y[0,0] = 1
>>> q
array([ 0.86751629, 0.87671107, 0.65239976, 0.41761267])
>>> x
array([[ 1. , 0.65239976],
[ 0.87671107, 0.41761267]])
>>> y
array([[ 1. , 0.87671107],
[ 0.65239976, 0.41761267]])
>>> y.flags['OWNDATA']
False
>>> x.flags['OWNDATA']
True
>>> q.flags['OWNDATA']
False
>>> np.may_share_memory(x,y)
True
>>> np.may_share_memory(x,q)
False
Because q
didn't reflect the change in the first element, like x
or y
, it must somehow be the owner of the data (somehow is explained below).
There is more discussion about the OWNDATA
flag over at the numpy-discussion mailinglist. In the How can I tell if NumPy creates a view or a copy? question, it is briefly mentioned that simply checking the flags.owndata
of an ndarray
sometimes seems to fail and that it seems unreliable, as you mention. That's because every ndarray
also has a base
attribute:
the base of an ndarray is a reference to another array if the memory originated elsewhere (otherwise, the base is None
). The operation y.reshape(4)
creates a copy, not a view, because the strides of y
are (8,16)
. To get it reshaped (C-contiguous) to (4,)
, the memory pointer would have to jump 0->16->8->24
, which is not doable with a single stride. Thus q.base
points to the memory location generated by the forced-copy-operation y.reshape
, which has the same shape as y
, but copied elements and thus has normal strides again: (16, 8)
. q.base
is thus not bound to by any other name as it was the result of the forced-copy operation y.reshape(4)
. Only now can the object q.base
be viewed in a (4,)
shape, because the strides allow this. q
is then indeed a view on q.base
.
For most people it would be confusing to see that q.flags.owndata
is False
, because, as shown above, it is not a view on y
. However, it is a view on a copy of y
. That copy, q.base
, is the owner of the data however. Thus the flags are actually correct, if you inspect closely.
I like to use .__array_interface__
.
In [811]: x.__array_interface__
Out[811]:
{'data': (149194496, False),
'descr': [('', '<f8')],
'shape': (4, 4),
'strides': None,
'typestr': '<f8',
'version': 3}
In [813]: y.__array_interface__
Out[813]:
{'data': (149194496, False),
'descr': [('', '<f8')],
'shape': (4, 4),
'strides': (8, 32),
'typestr': '<f8',
'version': 3}
In [814]: x.strides
Out[814]: (32, 8)
In [815]: y.strides
Out[815]: (8, 32)
Transpose was performed by reversing the strides. The base data pointer is the same.
In [817]: q.__array_interface__
Out[817]:
{'data': (165219304, False),
'descr': [('', '<f8')],
'shape': (16,),
'strides': None,
'typestr': '<f8',
'version': 3}
So the q
data is a copy (different pointer). Strides (8,)
means its elements are accessed by stepping from one f8
to the next. But a x.reshape(16)
is a view of x
- because its data can be accessed with a simple 8
step.
To access the original data in the q
order, it would have to step 32 bytes 3 times (down x
rows), then go back to the start and step 8 to the 2nd x
column, followed by 3 row steps, etc. Since striding doesn't work this way, it has to work from a copy.
Note also that y[0,0]
changes x[0,0]
, but q[0]
is independent of both.
While OWNDATA
for q
is false, it is True for y.ravel()
and y.flatten()
. I suspect reshape()
in this case is making a copy, and then reshaping, and it's the intermediate copy that 'owns' the data, q.base
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With