A fast way to find nonzero entries by row in a sparse matrix in Python

I am trying to find the indices of nonzero entries by row in a sparse matrix: scipy.sparse.csc_matrix. So far, I am looping over each row in the matrix, and using


to each row to get the nonzero column indices. But this method would take over an hour to find the nonzero column entries per row. Is there a fast way to do so? Thanks!

2 Answers

Use the .nonzero() method.

indices = sp_matrix.nonzero()

If you'd like the indices as (row, column) tuples, you can use zip.

indices = zip(*sp_matrix.nonzero())
It is relatively straightforward for a CSR matrix, so you can always do:

>>> a = sps.rand(5, 5, .2, format='csc')
>>> a.A
array([[ 0.        ,  0.        ,  0.68642384,  0.        ,  0.        ],
       [ 0.46120599,  0.        ,  0.83253467,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.07074811],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.21190832,  0.        ,  0.        ,  0.        ]])
>>> b = a.tocsr()
>>> np.split(b.indices, b.indptr[1:-1])
[array([2]), array([0, 2]), array([4]), array([], dtype=float64), array([1])]
