Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is np.where's result read-only for multi-dimensional arrays?

Tags:

python

numpy

>>> a = np.where(np.ones(5))[0]
>>> a
array([0, 1, 2, 3, 4])
>>> a.flags['WRITEABLE']
True

>>> b = np.where(np.ones((5,2)))[0]
>>> b
array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4])
>>> b.flags['WRITEABLE']
False

Why is b read-only while a is not? This is not mentioned in the documentation.

like image 427
1'' Avatar asked Feb 22 '15 20:02

1''


2 Answers

Internally numpy computes results in a one dimensional array, but then returns a tuple which are views with same stride on this array, one for each dimension;

for example, if the array is like this:

>>> a
array([[0, 1, 0],
       [3, 0, 5]])

internally numpy first calculates

>>> i
array([0, 1, 1, 0, 1, 2])

where i[2 * k] and i[2 * k + 1] are the indices of the k'th non-zero value; but, then the result that is returned is:

>>> (i[::2], i[1::2])
(array([0, 1, 1]), array([1, 0, 2]))

when, it creates the view it passes 0 as the flags argument. So, the writable flag is unset.

When the input array is one dimensional, it takes a short-cut and therefore the flags are set differently.

like image 103
behzad.nouri Avatar answered Oct 21 '22 05:10

behzad.nouri


The __array_interface__ gives evidence that the elements of b are views:

In [335]: b[0].__array_interface__
Out[335]: 
{'shape': (10,),
 'typestr': '<i4',
 'strides': (8,),
 'version': 3,
 'descr': [('', '<i4')],
 'data': (145873656, True)}

In [336]: b[1].__array_interface__
Out[336]: 
{'shape': (10,),
 'typestr': '<i4',
 'strides': (8,),
 'version': 3,
 'descr': [('', '<i4')],
 'data': (145873660, True)}

The shape and strides are the same. The data pointer differs by 4 bytes. That's what one would expect from views generated by (i[::2], i[1::2]). A strides of (8,) means every other integer.

And the internal 2d ret array can be 'recreated' with as_strided:

  In [362]: np.lib.stride_tricks.as_strided(b[0],(10,2),(8,4))
Out[362]: 
array([[0, 0],
       [0, 1],
       [1, 0],
       [1, 1],
       [2, 0],
       [2, 1],
       [3, 0],
       [3, 1],
       [4, 0],
       [4, 1]])

The tuple form is most useful for use as an index. Still it is interesting to know that underlying that tuple is a 2d array.


This is a Python replica of the C nonzero function (the ndim>1 case). It's in the last step, written as tuple(ret.T) where the readonly views are generated. As expected this is much slower than the C code. It should be an interesting cython adaptation exercise.

def nonzero(a):
    nnz = np.count_nonzero(a)
    ndim = a.ndim
    ret = np.zeros((nnz, ndim), dtype=np.intp)
    it = np.nditer(a,flags=['multi_index', 'zerosize_ok', 'refs_ok'],
            op_flags=['readonly'])
    n = 0
    while not it.finished:
        if it.value!=0:
            ret[n] = it.multi_index
            n += 1
        it.iternext()
    # tuple(ret[:,i] for i in range(ndim))
    return tuple(ret.T)
like image 22
hpaulj Avatar answered Oct 21 '22 07:10

hpaulj