Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy: Reliable (non-conservative) indicator if numpy array is view

Looking for a way to reliably identify if a numpy object is a view.

Related questions have come up many times before (here, here, here), and people have offered some solutions, but all seem to have problems:

  • The test used in pandas now is to call something a view if my_array.base is not None. This seems to always catch views, but also offers lots of false positives (situations where it reports something is a view even if it isn't).
  • numpy.may_share_memory() will check for two specific arrays, but won't answer generically
    • (@RobertKurn says was best tool as of 2012 -- any changes?)
  • flags['OWNDATA']) is reported (third comment first answer) to fail in some cases.

(The reason for my interest is that I'm working on implementing copy-on-write for pandas, and a conservative indicator is leading to over-copying.)

like image 983
nick_eu Avatar asked Nov 04 '15 19:11

nick_eu


People also ask

How do I know if a NumPy array is true?

If we have a Numpy array with boolean, True or False data, we can use np. all() to check if all of the elements are True .

Is NumPy array static or dynamic?

A Python array is dynamic and you can append new elements and delete existing ones. A NumPy array is more like an object-oriented version of a traditional C or C++ array. You can create NumPy arrays using a large range of data types from int8, uint8, float64, bool and through to complex128.

Is Python NumPy array better than lists?

The answer is performance. Numpy data structures perform better in: Size - Numpy data structures take up less space. Performance - they have a need for speed and are faster than lists.

What is the difference between NumPy array and normal array?

There are several important differences between NumPy arrays and the standard Python sequences: NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.


1 Answers

Depending on your usages, flags['OWNDATA'] would do the job. In fact, there's no problem with your link. It does not fail in some cases. It will always do what it's supposed to do.

According to http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.require.html: the flag "ensure an array that owns its own data".

In your "counterexample", they use the code:

print (b.flags['OWNDATA'])  #False -- apparently this is a view e = np.ravel(b[:, 2]) print (e.flags['OWNDATA'])  #True -- Apparently this is a new numpy object. 

But, it's the normal behaviour to be True in the second case.

It comes from the definition of ravel (from http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.ravel.html).

Return a contiguous flattened array. A 1-D array, containing the elements of the input, is returned. A copy is made only if needed.

Here, a copy is needed, so a copy is made. So, the variable e really owns its own data. It's not a "view of b", "a reference to b", "an alias to a part of b". It's a real new array that contains a copy of some elements of b.

So, I think that it's impossible without tracking the entire origin of the data to detect that kind of behaviour. I believe you should be able to build your program with that flag.

like image 171
Alexis Clarembeau Avatar answered Oct 03 '22 07:10

Alexis Clarembeau