Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

output of numpy.where(condition) is not an array, but a tuple of arrays: why?

I am experimenting with the numpy.where(condition[, x, y]) function.
From the numpy documentation, I learn that if you give just one array as input, it should return the indices where the array is non-zero (i.e. "True"):

If only condition is given, return the tuple condition.nonzero(), the indices where condition is True.

But if try it, it returns me a tuple of two elements, where the first is the wanted list of indices, and the second is a null element:

>>> import numpy as np >>> array = np.array([1,2,3,4,5,6,7,8,9]) >>> np.where(array>4) (array([4, 5, 6, 7, 8]),) # notice the comma before the last parenthesis 

so the question is: why? what is the purpose of this behaviour? in what situation this is useful? Indeed, to get the wanted list of indices I have to add the indexing, as in np.where(array>4)[0], which seems... "ugly".


ADDENDUM

I understand (from some answers) that it is actually a tuple of just one element. Still I don't understand why to give the output in this way. To illustrate how this is not ideal, consider the following error (which motivated my question in the first place):

>>> import numpy as np >>> array = np.array([1,2,3,4,5,6,7,8,9]) >>> pippo = np.where(array>4) >>> pippo + 1 Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: can only concatenate tuple (not "int") to tuple 

so that you need to do some indexing to access the actual array of indices:

>>> pippo[0] + 1 array([5, 6, 7, 8, 9]) 
like image 402
Fabio Avatar asked Nov 17 '15 01:11

Fabio


People also ask

Why does NumPy where return a tuple?

numpy. where returns a tuple because each element of the tuple refers to a dimension. As you can see, the first element of the tuple refers to the first dimension of relevant elements; the second element refers to the second dimension.

Is NumPy array a tuple?

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

How can we use conditions in NumPy within an array?

It returns a new numpy array, after filtering based on a condition, which is a numpy-like array of boolean values. For example, if condition is array([[True, True, False]]) , and our array is a = ndarray([[1, 2, 3]]) , on applying a condition to array ( a[:, condition] ), we will get the array ndarray([[1 2]]) .

What does NumPy where return?

where() in Python. The numpy. where() function returns the indices of elements in an input array where the given condition is satisfied.


2 Answers

In Python (1) means just 1. () can be freely added to group numbers and expressions for human readability (e.g. (1+3)*3 v (1+3,)*3). Thus to denote a 1 element tuple it uses (1,) (and requires you to use it as well).

Thus

(array([4, 5, 6, 7, 8]),) 

is a one element tuple, that element being an array.

If you applied where to a 2d array, the result would be a 2 element tuple.

The result of where is such that it can be plugged directly into an indexing slot, e.g.

a[where(a>0)] a[a>0] 

should return the same things

as would

I,J = where(a>0)   # a is 2d a[I,J] a[(I,J)] 

Or with your example:

In [278]: a=np.array([1,2,3,4,5,6,7,8,9]) In [279]: np.where(a>4) Out[279]: (array([4, 5, 6, 7, 8], dtype=int32),)  # tuple  In [280]: a[np.where(a>4)] Out[280]: array([5, 6, 7, 8, 9])  In [281]: I=np.where(a>4) In [282]: I Out[282]: (array([4, 5, 6, 7, 8], dtype=int32),) In [283]: a[I] Out[283]: array([5, 6, 7, 8, 9])  In [286]: i, = np.where(a>4)   # note the , on LHS In [287]: i Out[287]: array([4, 5, 6, 7, 8], dtype=int32)  # not tuple In [288]: a[i] Out[288]: array([5, 6, 7, 8, 9]) In [289]: a[(i,)] Out[289]: array([5, 6, 7, 8, 9]) 

======================

np.flatnonzero shows the correct way of returning just one array, regardless of the dimensions of the input array.

In [299]: np.flatnonzero(a>4) Out[299]: array([4, 5, 6, 7, 8], dtype=int32) In [300]: np.flatnonzero(a>4)+10 Out[300]: array([14, 15, 16, 17, 18], dtype=int32) 

It's doc says:

This is equivalent to a.ravel().nonzero()[0]

In fact that is literally what the function does.

By flattening a removes the question of what to do with multiple dimensions. And then it takes the response out of the tuple, giving you a plain array. With flattening it doesn't have make a special case for 1d arrays.

===========================

@Divakar suggests np.argwhere:

In [303]: np.argwhere(a>4) Out[303]:  array([[4],        [5],        [6],        [7],        [8]], dtype=int32) 

which does np.transpose(np.where(a>4))

Or if you don't like the column vector, you could transpose it again

In [307]: np.argwhere(a>4).T Out[307]: array([[4, 5, 6, 7, 8]], dtype=int32) 

except now it is a 1xn array.

We could just as well have wrapped where in array:

In [311]: np.array(np.where(a>4)) Out[311]: array([[4, 5, 6, 7, 8]], dtype=int32) 

Lots of ways of taking an array out the where tuple ([0], i,=, transpose, array, etc).

like image 193
hpaulj Avatar answered Sep 21 '22 10:09

hpaulj


Short answer: np.where is designed to have consistent output regardless of the dimension of the array.

A two-dimensional array has two indices, so the result of np.where is a length-2 tuple containing the relevant indices. This generalizes to a length-3 tuple for 3-dimensions, a length-4 tuple for 4 dimensions, or a length-N tuple for N dimensions. By this rule, it is clear that in 1 dimension, the result should be a length-1 tuple.

like image 31
jakevdp Avatar answered Sep 18 '22 10:09

jakevdp