I'm trying to convert the following MATLAB code to Python and am having trouble finding a solution that works in any reasonable amount of time.
M = diag(sum(a)) - a;
where = vertcat(in, out);
M(where,:) = 0;
M(where,where) = 1;
Here, a is a sparse matrix and where is a vector (as are in/out). The solution I have using Python is:
M = scipy.sparse.diags([degs], [0]) - A
where = numpy.hstack((inVs, outVs)).astype(int)
M = scipy.sparse.lil_matrix(M)
M[where, :] = 0 # This is the slowest line
M[where, where] = 1
M = scipy.sparse.csc_matrix(M)
But since A is 334863x334863, this takes like three minutes. If anyone has any suggestions on how to make this faster, please contribute them! For comparison, MATLAB does this same step imperceptibly fast.
Thanks!
A sparse matrix can be stored in full-matrix storage mode or a packed storage mode. When a sparse matrix is stored in full-matrix storage mode, all its elements, including its zero elements, are stored in an array.
Using sparse matrices to store data that contains a large number of zero-valued elements can both save a significant amount of memory and speed up the processing of that data.
The function csr_matrix() is used to create a sparse matrix of compressed sparse row format whereas csc_matrix() is used to create a sparse matrix of compressed sparse column format.
The compressed sparse row (CSR) or compressed row storage (CRS) or Yale format represents a matrix M by three (one-dimensional) arrays, that respectively contain nonzero values, the extents of rows, and column indices. It is similar to COO, but compresses the row indices, hence the name.
The solution I use for similar task attributes to @seberg and do not convert to lil
format:
import scipy.sparse
import numpy
import time
def csr_row_set_nz_to_val(csr, row, value=0):
"""Set all nonzero elements (elements currently in the sparsity pattern)
to the given value. Useful to set to 0 mostly.
"""
if not isinstance(csr, scipy.sparse.csr_matrix):
raise ValueError('Matrix given must be of CSR format.')
csr.data[csr.indptr[row]:csr.indptr[row+1]] = value
def csr_rows_set_nz_to_val(csr, rows, value=0):
for row in rows:
csr_row_set_nz_to_val(csr, row)
if value == 0:
csr.eliminate_zeros()
wrap your evaluations with timing
def evaluate(size):
degs = [1]*size
inVs = list(xrange(1, size, size/25))
outVs = list(xrange(5, size, size/25))
where = numpy.hstack((inVs, outVs)).astype(int)
start_time = time.time()
A = scipy.sparse.csc_matrix((size, size))
M = scipy.sparse.diags([degs], [0]) - A
csr_rows_set_nz_to_val(M, where)
return time.time()-start_time
and test its performance:
>>> print 'elapsed %.5f seconds' % evaluate(334863)
elapsed 0.53054 seconds
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With