Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding weird boolean 2d-array indexing behavior in numpy

Tags:

python

numpy

Why does this work:

a=np.random.rand(10,20)
x_range=np.arange(10)
y_range=np.arange(20)

a_tmp=a[x_range<5,:]
b=a_tmp[:,np.in1d(y_range,[3,4,8])]

and this does not:

a=np.random.rand(10,20)
x_range=np.arange(10)
y_range=np.arange(20)    

b=a[x_range<5,np.in1d(y_range,[3,4,8])]
like image 331
tillsten Avatar asked Oct 19 '11 11:10

tillsten


1 Answers

The Numpy reference documentation's page on indexing contains the answers, but requires a bit of careful reading.

The answer here is that indexing with booleans is equivalent to indexing with integer arrays obtained by first transforming the boolean arrays with np.nonzero. Therefore, with boolean arrays m1, m2

a[m1, m2] == a[m1.nonzero(), m2.nonzero()]

which (when it succeeds, i.e., m1.nonzero().shape == m2.nonzero().shape) is equivalent to:

[a[i, i] for i in range(a.shape[0]) if m1[i] and m2[i]]

I'm not sure why it was designed to work like this --- usually, this is not what you'd want.

To get the more intuitive result, you can instead do

a[np.ix_(m1, m2)]

which produces a result equivalent to

[[a[i,j] for j in range(a.shape[1]) if m2[j]] for i in range(a.shape[0]) if m1[i]]
like image 189
pv. Avatar answered Nov 10 '22 12:11

pv.