Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is most efficient way of setting row to zeros for a sparse scipy matrix?

I'm trying to convert the following MATLAB code to Python and am having trouble finding a solution that works in any reasonable amount of time.

M = diag(sum(a)) - a;
where = vertcat(in, out);
M(where,:) = 0;
M(where,where) = 1;

Here, a is a sparse matrix and where is a vector (as are in/out). The solution I have using Python is:

M = scipy.sparse.diags([degs], [0]) - A
where = numpy.hstack((inVs, outVs)).astype(int)
M = scipy.sparse.lil_matrix(M)
M[where, :] = 0  # This is the slowest line
M[where, where] = 1
M = scipy.sparse.csc_matrix(M)

But since A is 334863x334863, this takes like three minutes. If anyone has any suggestions on how to make this faster, please contribute them! For comparison, MATLAB does this same step imperceptibly fast.

Thanks!

like image 482
Alex Reinking Avatar asked Nov 05 '13 08:11

Alex Reinking


People also ask

How do you store sparse matrix efficiently?

A sparse matrix can be stored in full-matrix storage mode or a packed storage mode. When a sparse matrix is stored in full-matrix storage mode, all its elements, including its zero elements, are stored in an array.

Is sparse matrix memory efficient?

Using sparse matrices to store data that contains a large number of zero-valued elements can both save a significant amount of memory and speed up the processing of that data.

What does SciPy sparse Csr_matrix do?

The function csr_matrix() is used to create a sparse matrix of compressed sparse row format whereas csc_matrix() is used to create a sparse matrix of compressed sparse column format.

What is compressed sparse row format?

The compressed sparse row (CSR) or compressed row storage (CRS) or Yale format represents a matrix M by three (one-dimensional) arrays, that respectively contain nonzero values, the extents of rows, and column indices. It is similar to COO, but compresses the row indices, hence the name.


1 Answers

The solution I use for similar task attributes to @seberg and do not convert to lil format:

import scipy.sparse
import numpy
import time

def csr_row_set_nz_to_val(csr, row, value=0):
    """Set all nonzero elements (elements currently in the sparsity pattern)
    to the given value. Useful to set to 0 mostly.
    """
    if not isinstance(csr, scipy.sparse.csr_matrix):
        raise ValueError('Matrix given must be of CSR format.')
    csr.data[csr.indptr[row]:csr.indptr[row+1]] = value

def csr_rows_set_nz_to_val(csr, rows, value=0):
    for row in rows:
        csr_row_set_nz_to_val(csr, row)
    if value == 0:
        csr.eliminate_zeros()

wrap your evaluations with timing

def evaluate(size):
    degs = [1]*size
    inVs = list(xrange(1, size, size/25))
    outVs = list(xrange(5, size, size/25))
    where = numpy.hstack((inVs, outVs)).astype(int)
    start_time = time.time()
    A = scipy.sparse.csc_matrix((size, size))
    M = scipy.sparse.diags([degs], [0]) - A
    csr_rows_set_nz_to_val(M, where)
    return time.time()-start_time

and test its performance:

>>> print 'elapsed %.5f seconds' % evaluate(334863)
elapsed 0.53054 seconds
like image 150
alko Avatar answered Oct 12 '22 22:10

alko