Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy, how to get a sub matrix with boolean slicing

I have a question: how to get a sub matrix like a sub array by boolean slicing?

For example:

    a2 = np.array(np.arange(30).reshape(5, 6))
    a2[a2[:, 1] > 10]

will give me:

    array([[12, 13, 14, 15, 16, 17],
           [18, 19, 20, 21, 22, 23],
           [24, 25, 26, 27, 28, 29]])

but:

    m2 = np.mat(np.arange(30).reshape(5, 6))
    m2[m2[:, 1] > 10]

will give me:

    matrix([[12, 18, 24]])

Why the output is different and How can I get the same result as array from matrix?

Thank you!

like image 365
pinseng Avatar asked Sep 18 '14 00:09

pinseng


2 Answers

The issue you're experiencing comes down to the fact that operations on a matrix return always return a 2-dimensional array.

When you build the mask on the first array, you get:

In [24]: a2[:,1] > 10
Out[24]: array([False, False,  True,  True,  True], dtype=bool)

which, as you can see, is a 1-dimensional array.

When you do the same thing with the matrix, you get:

In [25]: m2[:,1] > 10
Out[25]: 
matrix([[False],
        [False],
        [ True],
        [ True],
        [ True]], dtype=bool)

In other words, you have a nx1 array, not an array of length n.


Indexing in numpy operates differently depending on whether you're indexing with a one or n dimensional array.

In your first case, numpy will treat the array of length n as row indices, so you'll get the expected result:

In [28]: a2[a2[:,1] > 10]
Out[28]: 
array([[12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29]])

In the second case, because you have a 2-dimensional index array, numpy has enough information to extract both the row and the column, and so it only grabs things from the matching column (the first one):

In [29]: m2[m2[:,1] > 10]
Out[29]: matrix([[12, 18, 24]])

To answer your question: you can get this behaviour by converting your masks to an array and grabbing the first column, to extract your initial array of length n:

In [32]: m2[np.array(m2[:,1] > 10)[:,0]]
Out[32]: 
matrix([[12, 13, 14, 15, 16, 17],
        [18, 19, 20, 21, 22, 23],
        [24, 25, 26, 27, 28, 29]])

Alternatively, you could do the conversion first, getting the same result as before:

In [34]: np.array(m2)[:,1] > 10
Out[34]: array([False, False,  True,  True,  True], dtype=bool)

Now, both of those fixes require conversions between matrices and arrays, which can be pretty ugly.

The question I'd be asking yourself is why you wish to use a matrix, and yet expect the behaviour of an array. It could be that the right tool for your job is actually an array, not a matrix.

like image 54
sapi Avatar answered Sep 24 '22 01:09

sapi


If you flatten the boolean mask like:

m2[np.asarray(m2[:,1]>10).flatten()]

you get the same result, but I would recommend using np.array instead of np.matrix for the reasons given in this answer.

like image 44
Saullo G. P. Castro Avatar answered Sep 26 '22 01:09

Saullo G. P. Castro