I am not undetstanding the difference between coo_matrix
, csr_matrix
and csc_matrix
.
The documentation does mention that coo_matrix is not efficient for arithmetic operations and we need to convert it to csr
or csc
. I am looking more into matrix multiplication. And I did not understand what is happening behind the scenes if I just have a coo_matrix
and convert it to csr
or csv
matrix.
Also if I have something like
A = array([[1,2,3,0,0,5],
[5,0,0,1,2,0]])
print coo_matrix(A)
It prints
(0, 0) 1
(0, 1) 2
(0, 2) 3
(0, 5) 5
which is cool. but is there a way, i can directly input my matrix as the one which is printed. Something like define a null COO matrix and then start defining the values of the coo_matrix
like how we do in matlab.
Thanks!
The terminology isn't invented by python scipy but already existed in sparse matrix representation science
There exists various formats in which sparse matrices can be represented.
Formats can be divided into two groups:
Coordinate list (COO)
COO stores a list of (row, column, value) tuples. Ideally, the entries are sorted (by row index, then column index) to improve random access times. This is another format which is good for incremental matrix construction
Compressed sparse row (CSR)
The compressed sparse row (CSR) or compressed row storage (CRS) format represents a matrix M by three (one-dimensional) arrays, that respectively contain nonzero values, the extents of rows, and column indices. This format allows fast row access and matrix-vector multiplications.
The CSR format stores a sparse m × n matrix M in row form using three (one-dimensional) arrays (A, IA, JA). Let NNZ denote the number of nonzero entries in M. (Note that zero-based indices shall be used here.)
The array A is of length NNZ and holds all the nonzero entries of M in left-to-right top-to-bottom ("row-major") order.
Thus, the first m elements of IA store the index into A of the first nonzero element in each row of M, and the last element IA[m] stores NNZ, the number of elements in A, which can be also thought of as the index in A of first element of a phantom row just beyond the end of the matrix M.
The values of the i-th row of the original matrix is read from the elements A[IA[i]] to A[IA[i + 1] − 1] (inclusive on both ends), i.e. from the start of one row to the last index just before the start of the next.
The third array, JA, contains the column index in M of each element of A and hence is of length NNZ as well.
For example, the matrix 0 0 0 0
5 8 0 0
0 0 3 0
0 6 0 0
is a 4 × 4 matrix with 4 nonzero elements, hence
A = [ 5 8 3 6 ]
IA = [ 0 0 2 3 4 ]
JA = [ 0 1 2 1 ]
Source: https://en.wikipedia.org/wiki/Sparse_matrix
A sparse matrix contains mostly zeros. coo_matrix
, csr_matrix
and csc_matrix
are all sparse matrix classes. The coo_matrix
is a list of row, column, value. This type of sparse matrix is inefficient for arithmetic because if you have a large matrix with a lot of zeros, you don't actually want to do math on all those zeros. You just want to do math on the non-zero values in your sparse matrix. The csr_matrix
and csc_matrix
are solutions to this problem. Instead of listing all the values in the sparse matrix, csr
and csc
are actually three 1-D matrices that have the non-zero value, a column index, and a row pointer (for csr
) that tells where the non-zero value is inside the sparse matrix. I don't want to rewrite the textbook, so here is more info and an example.
To answer your second question. You want to use scipy.sparse.dok_matrix
. This is a dictionary of keys based sparse matrix. You can edit it MATLAB style, and then convert it to csr
or csc
for arithmetic. Here is a simple example of editing one dynamically:
>>> A = scipy.sparse.dok_matrix((5,5))
>>> A[2,3] = 7
>>> print A
(2, 3) 7.0
csr_matrix
consider row first and csc_matrix
considers column first.
Here is a simple example to illustrate that: Let's take a matrix,
mat = [[1, 0, 0],
[5, 0, 2],
[0, -1, 0],
[0, 0, 3]]
csr_matrix gives the position of non-zero element in the row first then goes to second row, then to third and so on. for instance, csr_matrix(mat) returns:
(0, 0) 1.0 -- first row
(1, 0) 5.0 -- second row
(1, 2) 2.0 -- second row
(2, 1) -1.0 --third row
(3, 2) 3.0 -- fourth row
Similarly csc_matrix gives the position of non-zero elements in the first column, then second column, and so on.
(0, 0) 1.0 -- first column
(1, 0) 5.0 -- first column
(2, 1) -1.0 -- second column
(1, 2) 2.0 -- third column
(3, 2) 3.0 -- third column
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With