Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A fast way to find nonzero entries by row in a sparse matrix in Python

I am trying to find the indices of nonzero entries by row in a sparse matrix: scipy.sparse.csc_matrix. So far, I am looping over each row in the matrix, and using

numpy.nonzero()

to each row to get the nonzero column indices. But this method would take over an hour to find the nonzero column entries per row. Is there a fast way to do so? Thanks!

like image 330
user2498497 Avatar asked Jul 16 '14 21:07

user2498497


People also ask

How do you find non zero entries in a sparse matrix?

N = nnz( X ) returns the number of nonzero elements in matrix X .

What is Getnnz?

getnnz , which is the number of nonzero terms of a sparse matrix.

What is nnz in sparse matrix?

nnz returns the number of nonzero elements in a sparse matrix. nonzeros returns a column vector containing all the nonzero elements of a sparse matrix. nzmax returns the amount of storage space allocated for the nonzero entries of a sparse matrix.


2 Answers

Use the .nonzero() method.

indices = sp_matrix.nonzero()

If you'd like the indices as (row, column) tuples, you can use zip.

indices = zip(*sp_matrix.nonzero())
like image 153
Madison May Avatar answered Nov 01 '22 08:11

Madison May


It is relatively straightforward for a CSR matrix, so you can always do:

>>> a = sps.rand(5, 5, .2, format='csc')
>>> a.A
array([[ 0.        ,  0.        ,  0.68642384,  0.        ,  0.        ],
       [ 0.46120599,  0.        ,  0.83253467,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.07074811],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.21190832,  0.        ,  0.        ,  0.        ]])
>>> b = a.tocsr()
>>> np.split(b.indices, b.indptr[1:-1])
[array([2]), array([0, 2]), array([4]), array([], dtype=float64), array([1])]
like image 4
Jaime Avatar answered Nov 01 '22 06:11

Jaime