I have an scipy CSR matrix and i want to get element column indices for each row. My approach is:
import scipy.sparse as sp
N = 100
d = 0.1
M = sp.rand(N, N, d, format='csr')
indM = [row.nonzero()[1] for row in M]
indM is what i need, it has the same number of row as M and looks like this:
[array([ 6, 7, 11, ..., 79, 85, 86]),
array([12, 20, 25, ..., 84, 93, 95]),
...
array([ 7, 24, 32, 40, 50, 51, 57, 71, 74, 96]),
array([ 1, 4, 9, ..., 71, 95, 96])]
The problem is that with big matrices this approach looks slow. Is there any way to avoid list comprehension or somehow speed this up?
Thank you.
Location and Count of Nonzeros Create a 10-by-10 random sparse matrix with 7% density of nonzeros. A = sprand(10,10,0.07); Use nonzeros to find the values of the nonzero elements. Use nnz to count the number of nonzeros.
Density of Sparse Matrix The result indicates that only about 2% of the elements in the matrix are nonzero.
Why to use Sparse Matrix instead of simple matrix ? Storage: There are lesser non-zero elements than zeros and thus lesser memory can be used to store only those elements. Computing time: Computing time can be saved by logically designing a data structure traversing only non-zero elements..
Triplet Representation (Array Representation) In this representation, we consider only non-zero values along with their row and column index values. In this representation, the 0th row stores the total number of rows, total number of columns and the total number of non-zero values in the sparse matrix.
You can simply use the indices
and indptr
attributes directly:
import numpy
import scipy.sparse
N = 5
d = 0.3
M = scipy.sparse.rand(N, N, d, format='csr')
M.toarray()
# array([[ 0. , 0. , 0. , 0. , 0. ],
# [ 0. , 0. , 0. , 0. , 0.30404632],
# [ 0.63503713, 0. , 0. , 0. , 0. ],
# [ 0.68865311, 0.81492098, 0. , 0. , 0. ],
# [ 0.08984168, 0.87730292, 0. , 0. , 0.18609702]])
M.indices
# array([1, 2, 4, 3, 0, 1, 4], dtype=int32)
M.indptr
# array([0, 3, 4, 6, 6, 7], dtype=int32)
numpy.split(M.indices, M.indptr)[1:-1]
# [array([], dtype=int32),
# array([4], dtype=int32),
# array([0], dtype=int32),
# array([0, 1], dtype=int32),
# array([0, 1, 4], dtype=int32)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With