I am experimenting with the numpy.where(condition[, x, y])
function.
From the numpy documentation, I learn that if you give just one array as input, it should return the indices where the array is non-zero (i.e. "True"):
If only condition is given, return the tuple condition.nonzero(), the indices where condition is True.
But if try it, it returns me a tuple of two elements, where the first is the wanted list of indices, and the second is a null element:
>>> import numpy as np >>> array = np.array([1,2,3,4,5,6,7,8,9]) >>> np.where(array>4) (array([4, 5, 6, 7, 8]),) # notice the comma before the last parenthesis
so the question is: why? what is the purpose of this behaviour? in what situation this is useful? Indeed, to get the wanted list of indices I have to add the indexing, as in np.where(array>4)[0]
, which seems... "ugly".
ADDENDUM
I understand (from some answers) that it is actually a tuple of just one element. Still I don't understand why to give the output in this way. To illustrate how this is not ideal, consider the following error (which motivated my question in the first place):
>>> import numpy as np >>> array = np.array([1,2,3,4,5,6,7,8,9]) >>> pippo = np.where(array>4) >>> pippo + 1 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: can only concatenate tuple (not "int") to tuple
so that you need to do some indexing to access the actual array of indices:
>>> pippo[0] + 1 array([5, 6, 7, 8, 9])
numpy. where returns a tuple because each element of the tuple refers to a dimension. As you can see, the first element of the tuple refers to the first dimension of relevant elements; the second element refers to the second dimension.
A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.
It returns a new numpy array, after filtering based on a condition, which is a numpy-like array of boolean values. For example, if condition is array([[True, True, False]]) , and our array is a = ndarray([[1, 2, 3]]) , on applying a condition to array ( a[:, condition] ), we will get the array ndarray([[1 2]]) .
where() in Python. The numpy. where() function returns the indices of elements in an input array where the given condition is satisfied.
In Python (1)
means just 1
. ()
can be freely added to group numbers and expressions for human readability (e.g. (1+3)*3
v (1+3,)*3
). Thus to denote a 1 element tuple it uses (1,)
(and requires you to use it as well).
Thus
(array([4, 5, 6, 7, 8]),)
is a one element tuple, that element being an array.
If you applied where
to a 2d array, the result would be a 2 element tuple.
The result of where
is such that it can be plugged directly into an indexing slot, e.g.
a[where(a>0)] a[a>0]
should return the same things
as would
I,J = where(a>0) # a is 2d a[I,J] a[(I,J)]
Or with your example:
In [278]: a=np.array([1,2,3,4,5,6,7,8,9]) In [279]: np.where(a>4) Out[279]: (array([4, 5, 6, 7, 8], dtype=int32),) # tuple In [280]: a[np.where(a>4)] Out[280]: array([5, 6, 7, 8, 9]) In [281]: I=np.where(a>4) In [282]: I Out[282]: (array([4, 5, 6, 7, 8], dtype=int32),) In [283]: a[I] Out[283]: array([5, 6, 7, 8, 9]) In [286]: i, = np.where(a>4) # note the , on LHS In [287]: i Out[287]: array([4, 5, 6, 7, 8], dtype=int32) # not tuple In [288]: a[i] Out[288]: array([5, 6, 7, 8, 9]) In [289]: a[(i,)] Out[289]: array([5, 6, 7, 8, 9])
======================
np.flatnonzero
shows the correct way of returning just one array, regardless of the dimensions of the input array.
In [299]: np.flatnonzero(a>4) Out[299]: array([4, 5, 6, 7, 8], dtype=int32) In [300]: np.flatnonzero(a>4)+10 Out[300]: array([14, 15, 16, 17, 18], dtype=int32)
It's doc says:
This is equivalent to a.ravel().nonzero()[0]
In fact that is literally what the function does.
By flattening a
removes the question of what to do with multiple dimensions. And then it takes the response out of the tuple, giving you a plain array. With flattening it doesn't have make a special case for 1d arrays.
===========================
@Divakar suggests np.argwhere
:
In [303]: np.argwhere(a>4) Out[303]: array([[4], [5], [6], [7], [8]], dtype=int32)
which does np.transpose(np.where(a>4))
Or if you don't like the column vector, you could transpose it again
In [307]: np.argwhere(a>4).T Out[307]: array([[4, 5, 6, 7, 8]], dtype=int32)
except now it is a 1xn array.
We could just as well have wrapped where
in array
:
In [311]: np.array(np.where(a>4)) Out[311]: array([[4, 5, 6, 7, 8]], dtype=int32)
Lots of ways of taking an array out the where
tuple ([0]
, i,=
, transpose
, array
, etc).
Short answer: np.where
is designed to have consistent output regardless of the dimension of the array.
A two-dimensional array has two indices, so the result of np.where
is a length-2 tuple containing the relevant indices. This generalizes to a length-3 tuple for 3-dimensions, a length-4 tuple for 4 dimensions, or a length-N tuple for N dimensions. By this rule, it is clear that in 1 dimension, the result should be a length-1 tuple.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With