Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sparse matrix: how to get nonzero indices for each row

I have an scipy CSR matrix and i want to get element column indices for each row. My approach is:

import scipy.sparse as sp
N = 100
d = 0.1
M = sp.rand(N, N, d, format='csr')

indM = [row.nonzero()[1] for row in M]

indM is what i need, it has the same number of row as M and looks like this:

[array([ 6,  7, 11, ..., 79, 85, 86]),
 array([12, 20, 25, ..., 84, 93, 95]),
...
 array([ 7, 24, 32, 40, 50, 51, 57, 71, 74, 96]),
 array([ 1,  4,  9, ..., 71, 95, 96])]

The problem is that with big matrices this approach looks slow. Is there any way to avoid list comprehension or somehow speed this up?

Thank you.

like image 724
Alexey Trofimov Avatar asked Jun 14 '17 05:06

Alexey Trofimov


People also ask

How do you find non zero entries in a sparse matrix?

Location and Count of Nonzeros Create a 10-by-10 random sparse matrix with 7% density of nonzeros. A = sprand(10,10,0.07); Use nonzeros to find the values of the nonzero elements. Use nnz to count the number of nonzeros.

How many nonzero entries are there in the sparse matrix?

Density of Sparse Matrix The result indicates that only about 2% of the elements in the matrix are nonzero.

Why do we use sparse matrix representation when we have a lot of zeros in the given matrix?

Why to use Sparse Matrix instead of simple matrix ? Storage: There are lesser non-zero elements than zeros and thus lesser memory can be used to store only those elements. Computing time: Computing time can be saved by logically designing a data structure traversing only non-zero elements..

In which method of representation of a sparse matrix non zero elements of matrix are in the array using row ID and column ID?

Triplet Representation (Array Representation) In this representation, we consider only non-zero values along with their row and column index values. In this representation, the 0th row stores the total number of rows, total number of columns and the total number of non-zero values in the sparse matrix.


1 Answers

You can simply use the indices and indptr attributes directly:

import numpy
import scipy.sparse

N = 5
d = 0.3
M = scipy.sparse.rand(N, N, d, format='csr')
M.toarray()
# array([[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
#        [ 0.        ,  0.        ,  0.        ,  0.        ,  0.30404632],
#        [ 0.63503713,  0.        ,  0.        ,  0.        ,  0.        ],
#        [ 0.68865311,  0.81492098,  0.        ,  0.        ,  0.        ],
#        [ 0.08984168,  0.87730292,  0.        ,  0.        ,  0.18609702]])

M.indices
# array([1, 2, 4, 3, 0, 1, 4], dtype=int32)
M.indptr
# array([0, 3, 4, 6, 6, 7], dtype=int32)

numpy.split(M.indices, M.indptr)[1:-1]
# [array([], dtype=int32),
#  array([4], dtype=int32),
#  array([0], dtype=int32),
#  array([0, 1], dtype=int32),
#  array([0, 1, 4], dtype=int32)]
like image 117
Nils Werner Avatar answered Oct 08 '22 09:10

Nils Werner