Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing elements in coo_matrix

Tags:

python

scipy

This is a very simple question. For SciPy sparse matrices like coo_matrix, how does one access individual elements?

To give an analogy to Eigen linear algebra library. One can access element (i,j) using coeffRef as follows:

myMatrix.coeffRef(i,j)
like image 314
Hari Avatar asked May 11 '15 09:05

Hari


People also ask

What does Coo_matrix do in Python?

A sparse matrix in COOrdinate format. Also known as the 'ijv' or 'triplet' format. to construct an empty matrix with shape (M, N) dtype is optional, defaulting to dtype='d'.

What is compressed sparse row format?

The compressed sparse row (CSR) or compressed row storage (CRS) or Yale format represents a matrix M by three (one-dimensional) arrays, that respectively contain nonzero values, the extents of rows, and column indices. It is similar to COO, but compresses the row indices, hence the name.

What is the difference between CSR and CSC?

CSC is more efficient at accessing column-vectors or column operations, generally, as it is stored as arrays of columns and their value at each row. CSR matrices are the opposite; stored as arrays of rows and their values at each column, and are more efficient at accessing row-vectors or row operations.

How do you represent a sparse matrix?

Representing a sparse matrix by a 2D array leads to wastage of lots of memory as zeroes in the matrix are of no use in most of the cases. So, instead of storing zeroes with non-zero elements, we only store non-zero elements. This means storing non-zero elements with triples- (Row, Column, value).


2 Answers

From the docs for coo_matrix:

 |  Intended Usage
 |      - COO is a fast format for constructing sparse matrices
 |      - Once a matrix has been constructed, convert to CSR or
 |        CSC format for fast arithmetic and matrix vector operations
 |      - By default when converting to CSR or CSC format, duplicate (i,j)
 |        entries will be summed together.  This facilitates efficient
 |        construction of finite element matrices and the like. (see example)

And indeed, csr_matrix supports the indexing in an expected way:

>>> from scipy.sparse import coo_matrix
>>> m = coo_matrix([[1, 2, 3], [4, 5, 6]])
>>> m1 = m.tocsr()
>>> m1[1, 2]
6
>>> m1
<2x3 sparse matrix of type '<type 'numpy.int64'>'
    with 6 stored elements in Compressed Sparse Row format>

(The way I found the above quote from the docs was >>> help(m) which is equivalent to the online docs).

like image 182
ev-br Avatar answered Oct 06 '22 07:10

ev-br


To expand on converting a coo matrix to csr to index, here are some timings for a small sparse matrix

Make the matrix

In [158]: M=sparse.coo_matrix([[0,1,2,0,0],[0,0,0,1,0],[0,1,0,0,0]])

In [159]: timeit M[1,2]
TypeError: 'coo_matrix' object is not subscriptable

In [160]: timeit M.tocsc()[1,2]
1000 loops, best of 3: 375 µs per loop

In [161]: timeit M.tocsr()[1,2]
1000 loops, best of 3: 241 µs per loop

In [162]: timeit M.todok()[1,2]
10000 loops, best of 3: 65.8 µs per loop

In [163]: timeit M.tolil()[1,2]
1000 loops, best of 3: 270 µs per loop

Apparently for selecting a single element, dok, is fastests (counting the conversion time). This format is actually a dictionary, which of course has fast element access.

But if you are frequently accessing whole rows, or whole columns, or iterating over rows or columns, you need to read the docs more carefully, and may be do your own time tests of typical arrays.

If you are setting values, not just reading them, timings and even implementation can be different. You will get an efficiency warning if you try to change a 0 element of a csr or csc format.

like image 30
hpaulj Avatar answered Oct 06 '22 08:10

hpaulj