Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scipy: basic clarifications

Tags:

python

scipy

I am not undetstanding the difference between coo_matrix, csr_matrix and csc_matrix.

The documentation does mention that coo_matrix is not efficient for arithmetic operations and we need to convert it to csr or csc. I am looking more into matrix multiplication. And I did not understand what is happening behind the scenes if I just have a coo_matrix and convert it to csr or csv matrix.

Also if I have something like

A = array([[1,2,3,0,0,5],
        [5,0,0,1,2,0]])
print coo_matrix(A)

It prints

  (0, 0)    1
  (0, 1)    2
  (0, 2)    3
  (0, 5)    5

which is cool. but is there a way, i can directly input my matrix as the one which is printed. Something like define a null COO matrix and then start defining the values of the coo_matrix like how we do in matlab.

Thanks!

like image 727
Gaurav Avatar asked Jun 26 '12 22:06

Gaurav


3 Answers

The terminology isn't invented by python scipy but already existed in sparse matrix representation science

There exists various formats in which sparse matrices can be represented.
Formats can be divided into two groups:

  1. Those that support efficient modification, such as DOK (Dictionary of keys), LIL (List of lists), or COO (Coordinate list). These are typically used to construct the matrices.
  2. Those that support efficient access and matrix operations, such as CSR (Compressed Sparse Row) or CSC (Compressed Sparse Column).

Coordinate list (COO)

COO stores a list of (row, column, value) tuples. Ideally, the entries are sorted (by row index, then column index) to improve random access times. This is another format which is good for incremental matrix construction

Compressed sparse row (CSR)

The compressed sparse row (CSR) or compressed row storage (CRS) format represents a matrix M by three (one-dimensional) arrays, that respectively contain nonzero values, the extents of rows, and column indices. This format allows fast row access and matrix-vector multiplications.

The CSR format stores a sparse m × n matrix M in row form using three (one-dimensional) arrays (A, IA, JA). Let NNZ denote the number of nonzero entries in M. (Note that zero-based indices shall be used here.)

The array A is of length NNZ and holds all the nonzero entries of M in left-to-right top-to-bottom ("row-major") order.

  1. The array IA is of length m + 1. It is defined by this recursive definition:
    IA[0] = 0
    IA[i] = IA[i − 1] + (number of nonzero elements on the (i − 1)-th row in the original matrix)

Thus, the first m elements of IA store the index into A of the first nonzero element in each row of M, and the last element IA[m] stores NNZ, the number of elements in A, which can be also thought of as the index in A of first element of a phantom row just beyond the end of the matrix M.
The values of the i-th row of the original matrix is read from the elements A[IA[i]] to A[IA[i + 1] − 1] (inclusive on both ends), i.e. from the start of one row to the last index just before the start of the next.

The third array, JA, contains the column index in M of each element of A and hence is of length NNZ as well.

For example, the matrix 0 0 0 0
5 8 0 0
0 0 3 0
0 6 0 0

is a 4 × 4 matrix with 4 nonzero elements, hence

A = [ 5 8 3 6 ]
IA = [ 0 0 2 3 4 ]
JA = [ 0 1 2 1 ]

Source: https://en.wikipedia.org/wiki/Sparse_matrix

like image 67
Prakhar Agrawal Avatar answered Oct 01 '22 15:10

Prakhar Agrawal


A sparse matrix contains mostly zeros. coo_matrix, csr_matrix and csc_matrix are all sparse matrix classes. The coo_matrix is a list of row, column, value. This type of sparse matrix is inefficient for arithmetic because if you have a large matrix with a lot of zeros, you don't actually want to do math on all those zeros. You just want to do math on the non-zero values in your sparse matrix. The csr_matrix and csc_matrix are solutions to this problem. Instead of listing all the values in the sparse matrix, csr and csc are actually three 1-D matrices that have the non-zero value, a column index, and a row pointer (for csr) that tells where the non-zero value is inside the sparse matrix. I don't want to rewrite the textbook, so here is more info and an example.

To answer your second question. You want to use scipy.sparse.dok_matrix. This is a dictionary of keys based sparse matrix. You can edit it MATLAB style, and then convert it to csr or csc for arithmetic. Here is a simple example of editing one dynamically:

>>> A = scipy.sparse.dok_matrix((5,5))
>>> A[2,3] = 7
>>> print A
  (2, 3)      7.0
like image 30
sbraden Avatar answered Oct 01 '22 15:10

sbraden


csr_matrix consider row first and csc_matrix considers column first.

Here is a simple example to illustrate that: Let's take a matrix,

mat = [[1, 0, 0],
       [5, 0, 2],
       [0, -1, 0],
       [0, 0, 3]]

csr_matrix gives the position of non-zero element in the row first then goes to second row, then to third and so on. for instance, csr_matrix(mat) returns:

(0, 0)  1.0 -- first row
(1, 0)  5.0 -- second row
(1, 2)  2.0 -- second row
(2, 1)  -1.0 --third row
(3, 2)  3.0 -- fourth row

Similarly csc_matrix gives the position of non-zero elements in the first column, then second column, and so on.

(0, 0)  1.0 -- first column
(1, 0)  5.0 -- first column
(2, 1)  -1.0 -- second column
(1, 2)  2.0 -- third column
(3, 2)  3.0 -- third column
like image 22
DrGeneral Avatar answered Oct 01 '22 14:10

DrGeneral