Looking for a way to reliably identify if a numpy object is a view.
Related questions have come up many times before (here, here, here), and people have offered some solutions, but all seem to have problems:
pandas
now is to call something a view if my_array.base is not None
. This seems to always catch views, but also offers lots of false positives (situations where it reports something is a view even if it isn't). numpy.may_share_memory()
will check for two specific arrays, but won't answer generically flags['OWNDATA'])
is reported (third comment first answer) to fail in some cases. (The reason for my interest is that I'm working on implementing copy-on-write for pandas, and a conservative indicator is leading to over-copying.)
If we have a Numpy array with boolean, True or False data, we can use np. all() to check if all of the elements are True .
A Python array is dynamic and you can append new elements and delete existing ones. A NumPy array is more like an object-oriented version of a traditional C or C++ array. You can create NumPy arrays using a large range of data types from int8, uint8, float64, bool and through to complex128.
The answer is performance. Numpy data structures perform better in: Size - Numpy data structures take up less space. Performance - they have a need for speed and are faster than lists.
There are several important differences between NumPy arrays and the standard Python sequences: NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.
Depending on your usages, flags['OWNDATA']
would do the job. In fact, there's no problem with your link. It does not fail in some cases. It will always do what it's supposed to do.
According to http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.require.html: the flag "ensure an array that owns its own data".
In your "counterexample", they use the code:
print (b.flags['OWNDATA']) #False -- apparently this is a view e = np.ravel(b[:, 2]) print (e.flags['OWNDATA']) #True -- Apparently this is a new numpy object.
But, it's the normal behaviour to be True in the second case.
It comes from the definition of ravel
(from http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.ravel.html).
Return a contiguous flattened array. A 1-D array, containing the elements of the input, is returned. A copy is made only if needed.
Here, a copy is needed, so a copy is made. So, the variable e really owns its own data. It's not a "view of b", "a reference to b", "an alias to a part of b". It's a real new array that contains a copy of some elements of b.
So, I think that it's impossible without tracking the entire origin of the data to detect that kind of behaviour. I believe you should be able to build your program with that flag.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With