Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter values from a scipy sparse matrix

I am trying to filter values smaller than 10 from a huge (1Mx1M) CSR matrix (SciPy). Since all my values are integers, dividing by 10 and remultiplying by 10 does the job, but I was wondering if there isn't a better way to go about filtering elements.

EDIT: The answer below works. Check that you have the latest version of SciPy.

like image 459
Omer Avatar asked Feb 27 '14 16:02

Omer


2 Answers

You can also go with the less hacky, but probably slower:

m = m.multiply(m >= 10)

To understand what's going on:

>>> m = scipy.sparse.csr_matrix((1000, 1000), dtype=np.int)
>>> m[np.random.randint(0, 1000, 20),
      np.random.randint(0, 1000, 20)] = np.random.randint(0, 100, 20)
>>> m.data
array([92, 46, 99, 24, 75, 16, 49, 60, 87, 64, 91, 37, 30, 32, 25, 40, 99,
        9,  3, 84])
>>> m >= 10
<1000x1000 sparse matrix of type '<type 'numpy.bool_'>'
    with 18 stored elements in Compressed Sparse Row format>
>>> m = m.multiply(m >= 10)
>>> m
<1000x1000 sparse matrix of type '<type 'numpy.int32'>'
    with 18 stored elements in Compressed Sparse Row format>
>>> m.data
array([92, 46, 99, 24, 75, 16, 49, 60, 87, 64, 91, 37, 30, 32, 25, 40, 99,
       84])
like image 144
Jaime Avatar answered Nov 14 '22 07:11

Jaime


I think the version issue has to do with the implementation of the comparison operators. m >= 0, uses a m.__gt__. (I don't have an earlier version of scipy to test this, but I believe there is one or more SO threads on the topic).

Something which might work in earlier version is:

m.data *= m.data>=10
m.eliminate_zeros()

In other words use a standard numpy operation to set selected values to 0. The test could be a lot more complicated. And then use a standard sparse function to clean it up. When you say, 'filter' that's essentially what you want to do, isn't it: set some values to zero and remove them from the sparse matrix?

like image 35
hpaulj Avatar answered Nov 14 '22 07:11

hpaulj