Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scipy.sparse : Set row to zeros

Suppose I have a matrix in the CSR format, what is the most efficient way to set a row (or rows) to zeros?

The following code runs quite slowly:

A = A.tolil()
A[indices, :] = 0
A = A.tocsr()

I had to convert to scipy.sparse.lil_matrix because the CSR format seems to support neither fancy indexing nor setting values to slices.

like image 386
Ashwin Srinath Avatar asked Aug 26 '12 12:08

Ashwin Srinath


People also ask

What does Scipy sparse Csr_matrix do?

Returns a copy of column i of the matrix, as a (m x 1) CSR matrix (column vector). Format of a matrix representation as a string. Maximum number of elements to display when printed. Number of stored values, including explicit zeros.

What is Indptr?

1. In words, indptr (index pointer) represents the indices for partitioning the data and indices (column indices). To fill in the matrix with the (nonzero) data, knowing the column indices is clearly not enough.

What is Lil_matrix?

lil_matrix((M, N), [dtype]) to construct an empty matrix with shape (M, N) dtype is optional, defaulting to dtype='d'. Notes. Sparse matrices can be used in arithmetic operations: they support addition, subtraction, multiplication, division, and matrix power.

What is compressed sparse row format?

The compressed sparse row (CSR) or compressed row storage (CRS) or Yale format represents a matrix M by three (one-dimensional) arrays, that respectively contain nonzero values, the extents of rows, and column indices. It is similar to COO, but compresses the row indices, hence the name.


2 Answers

I guess scipy just does not implement it, but the CSR format would support this quite well, please read the wikipedia article on "Sparse matrix" about what indptr, etc. are:

# A.indptr is an array, one for each row (+1 for the nnz):

def csr_row_set_nz_to_val(csr, row, value=0):
    """Set all nonzero elements (elements currently in the sparsity pattern)
    to the given value. Useful to set to 0 mostly.
    """
    if not isinstance(csr, scipy.sparse.csr_matrix):
        raise ValueError('Matrix given must be of CSR format.')
    csr.data[csr.indptr[row]:csr.indptr[row+1]] = value

# Now you can just do:
for row in indices:
    csr_row_set_nz_to_val(A, row, 0)

# And to remove zeros from the sparsity pattern:
A.eliminate_zeros()

Of course this removes 0s that were set from another place with eliminate_zeros from the sparsity pattern. If you want to do that (at this point) depends on what you are doing really, ie. elimination might make sense to delay until all other calculations that might add new zero's are done as well, or in some cases you may have 0 values, that you want to change again later, so it would be very bad to eliminate them!

You could in principle of course short-circuit the eliminate_zeros and prune, but that should be a lot of hassle, and might be even slower (because you won't do it in C).


Details about eliminiate_zeros (and prune)

The sparse matrix, does generally not save zero elements, but just stores where the nonzero elements are (roughly and with various methods). eliminate_zeros removes all zeros in your matrix from the sparsity pattern (ie. there is no value stored for that position, when before there was a vlaue stored, but it was 0). Eliminate is bad if you want to change a 0 to a different value lateron, otherwise, it saves space.

Prune would just shrink the data arrays stored when they are longer then necessary. Note that while I first had A.prune() in there, A.eliminiate_zeros() already includes prune.

like image 167
seberg Avatar answered Jan 03 '23 11:01

seberg


You can use matrix dot product to achieve that zeroing. Since the matrix we'll use is very sparse (diagonal with zeros for the rows/columns we which to zero out), multiplication should be efficient.

You'll need one of the following functions:

import scipy.sparse

def zero_rows(M, rows):
    diag = scipy.sparse.eye(M.shape[0]).tolil()
    for r in rows:
        diag[r, r] = 0
    return diag.dot(M)

def zero_columns(M, columns):
    diag = scipy.sparse.eye(M.shape[1]).tolil()
    for c in columns:
        diag[c, c] = 0
    return M.dot(diag)

Usage example:

>>> A = scipy.sparse.csr_matrix([[1,0,3,4], [5,6,0,8], [9,10,11,0]])
>>> A
<3x4 sparse matrix of type '<class 'numpy.int64'>'
        with 9 stored elements in Compressed Sparse Row format>
>>> A.toarray()
array([[ 1,  0,  3,  4],
       [ 5,  6,  0,  8],
       [ 9, 10, 11,  0]], dtype=int64)

>>> B = zero_rows(A, [1])
>>> B
<3x4 sparse matrix of type '<class 'numpy.float64'>'
        with 6 stored elements in Compressed Sparse Row format>
>>> B.toarray()
array([[  1.,   0.,   3.,   4.],
       [  0.,   0.,   0.,   0.],
       [  9.,  10.,  11.,   0.]])

>>> C = zero_columns(A, [1, 3])
>>> C
<3x4 sparse matrix of type '<class 'numpy.float64'>'
        with 5 stored elements in Compressed Sparse Row format>
>>> C.toarray()
array([[  1.,   0.,   3.,   0.],
       [  5.,   0.,   0.,   0.],
       [  9.,   0.,  11.,   0.]])
like image 43
dubek Avatar answered Jan 03 '23 13:01

dubek