output of numpy.where(condition) is not an array, but a tuple of arrays: why?

Tags:

I am experimenting with the numpy.where(condition[, x, y]) function.
From the numpy documentation, I learn that if you give just one array as input, it should return the indices where the array is non-zero (i.e. "True"):

If only condition is given, return the tuple condition.nonzero(), the indices where condition is True.

But if try it, it returns me a tuple of two elements, where the first is the wanted list of indices, and the second is a null element:

>>> import numpy as np >>> array = np.array([1,2,3,4,5,6,7,8,9]) >>> np.where(array>4) (array([4, 5, 6, 7, 8]),) # notice the comma before the last parenthesis

so the question is: why? what is the purpose of this behaviour? in what situation this is useful? Indeed, to get the wanted list of indices I have to add the indexing, as in np.where(array>4)[0], which seems... "ugly".

ADDENDUM

I understand (from some answers) that it is actually a tuple of just one element. Still I don't understand why to give the output in this way. To illustrate how this is not ideal, consider the following error (which motivated my question in the first place):

>>> import numpy as np >>> array = np.array([1,2,3,4,5,6,7,8,9]) >>> pippo = np.where(array>4) >>> pippo + 1 Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: can only concatenate tuple (not "int") to tuple

so that you need to do some indexing to access the actual array of indices:

>>> pippo[0] + 1 array([5, 6, 7, 8, 9])

402

asked Nov 17 '15 01:11

Fabio

2 Answers

In Python (1) means just 1. () can be freely added to group numbers and expressions for human readability (e.g. (1+3)*3 v (1+3,)*3). Thus to denote a 1 element tuple it uses (1,) (and requires you to use it as well).

Thus

(array([4, 5, 6, 7, 8]),)

is a one element tuple, that element being an array.

If you applied where to a 2d array, the result would be a 2 element tuple.

The result of where is such that it can be plugged directly into an indexing slot, e.g.

a[where(a>0)] a[a>0]

should return the same things

as would

I,J = where(a>0)   # a is 2d a[I,J] a[(I,J)]

Or with your example:

In [278]: a=np.array([1,2,3,4,5,6,7,8,9]) In [279]: np.where(a>4) Out[279]: (array([4, 5, 6, 7, 8], dtype=int32),)  # tuple  In [280]: a[np.where(a>4)] Out[280]: array([5, 6, 7, 8, 9])  In [281]: I=np.where(a>4) In [282]: I Out[282]: (array([4, 5, 6, 7, 8], dtype=int32),) In [283]: a[I] Out[283]: array([5, 6, 7, 8, 9])  In [286]: i, = np.where(a>4)   # note the , on LHS In [287]: i Out[287]: array([4, 5, 6, 7, 8], dtype=int32)  # not tuple In [288]: a[i] Out[288]: array([5, 6, 7, 8, 9]) In [289]: a[(i,)] Out[289]: array([5, 6, 7, 8, 9])

======================

np.flatnonzero shows the correct way of returning just one array, regardless of the dimensions of the input array.

In [299]: np.flatnonzero(a>4) Out[299]: array([4, 5, 6, 7, 8], dtype=int32) In [300]: np.flatnonzero(a>4)+10 Out[300]: array([14, 15, 16, 17, 18], dtype=int32)

It's doc says:

This is equivalent to a.ravel().nonzero()[0]

In fact that is literally what the function does.

By flattening a removes the question of what to do with multiple dimensions. And then it takes the response out of the tuple, giving you a plain array. With flattening it doesn't have make a special case for 1d arrays.

===========================

@Divakar suggests np.argwhere:

In [303]: np.argwhere(a>4) Out[303]:  array([[4],        [5],        [6],        [7],        [8]], dtype=int32)

which does np.transpose(np.where(a>4))

Or if you don't like the column vector, you could transpose it again

In [307]: np.argwhere(a>4).T Out[307]: array([[4, 5, 6, 7, 8]], dtype=int32)

except now it is a 1xn array.

We could just as well have wrapped where in array:

In [311]: np.array(np.where(a>4)) Out[311]: array([[4, 5, 6, 7, 8]], dtype=int32)

Lots of ways of taking an array out the where tuple ([0], i,=, transpose, array, etc).

193

answered Sep 21 '22 10:09

hpaulj

Short answer: np.where is designed to have consistent output regardless of the dimension of the array.

A two-dimensional array has two indices, so the result of np.where is a length-2 tuple containing the relevant indices. This generalizes to a length-3 tuple for 3-dimensions, a length-4 tuple for 4 dimensions, or a length-N tuple for N dimensions. By this rule, it is clear that in 1 dimension, the result should be a length-1 tuple.

answered Sep 18 '22 10:09

jakevdp

Related questions
                            
                                Python socket.error: [Errno 111] Connection refused
                            
                                sklearn and large datasets
                            
                                How to set dtypes by column in pandas DataFrame
                            
                                Why do assertions in unittest use TestCase.assertEqual not the assert keyword?
                            
                                Suppress stdout / stderr print from Python functions
                            
                                Auto adjust font size in seaborn heatmap
                            
                                'Cannot setup a Python SDK' in PyCharm project using virtualenv after OS reinstallation
                            
                                Using OpenGL with Python [closed]
                            
                                IOError: [Errno 24] Too many open files:
                            
                                Difference between os.path.exists and os.path.isfile?
                            
                                Python web hosting: Numpy, Matplotlib, Scientific Computing
                            
                                UserWarning: FixedFormatter should only be used together with FixedLocator
                            
                                Coverage.py warning: No data was collected. (no-data-collected)
                            
                                How can I speed up reading multiple files and putting the data into a dataframe?
                            
                                Use of colon in variable declaration [duplicate]
                            
                                TypeError: expected string or buffer
                            
                                Python Enum, when and where to use?
                            
                                Python read from subprocess stdout and stderr separately while preserving order
                            
                                python Pandas DataFrame copy(deep=False) vs copy(deep=True) vs '='
                            
                                Is Python's logging module thread safe?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

output of numpy.where(condition) is not an array, but a tuple of arrays: why?

Tags:

python

arrays

numpy

Fabio

People also ask

2 Answers

hpaulj

jakevdp

Recent Activity

Donate For Us