Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scipy csr_matrix: understand indptr

Every once in a while, I get to manipulate a csr_matrix but I always forget how the parameters indices and indptr work together to build a sparse matrix.

I am looking for a clear and intuitive explanation on how the indptr interacts with both the data and indices parameters when defining a sparse matrix using the notation csr_matrix((data, indices, indptr), [shape=(M, N)]).

I can see from the scipy documentation that the data parameter contains all the non-zero data, and the indices parameter contains the columns associated to that data (as such, indices is equal to col in the example given in the documentation). But how can we explain in clear terms the indptr parameter?

like image 942
Tanguy Avatar asked Sep 12 '18 16:09

Tanguy


People also ask

What is Indptr in Scipy?

If the sparse matrix has M rows, indptr is an array containing M+1 elements. for row i, [indptr[i]:indptr[i+1]] returns the indices of elements to take from data and indices corresponding to row i.

What does Scipy sparse Csr_matrix do?

The function csr_matrix() is used to create a sparse matrix of compressed sparse row format whereas csc_matrix() is used to create a sparse matrix of compressed sparse column format.

How do you read a sparse matrix in python?

One way to visualize sparse matrix is to use 2d plot. Python's matplotlib has a special function called Spy for visualizing sparse matrix. Spy is very similar to matplotlib's imshow, which is great for plotting a matrix or an array as an image. imshow works with dense matrix, while Spy works with sparse matrix.


1 Answers

Maybe this explanation can help understand the concept:

  • data is an array containing all the non zero elements of the sparse matrix.
  • indices is an array mapping each element in data to its column in the sparse matrix.
  • indptr then maps the elements of data and indices to the rows of the sparse matrix. This is done with the following reasoning:

    1. If the sparse matrix has M rows, indptr is an array containing M+1 elements
    2. for row i, [indptr[i]:indptr[i+1]] returns the indices of elements to take from data and indices corresponding to row i. So suppose indptr[i]=k and indptr[i+1]=l, the data corresponding to row i would be data[k:l] at columns indices[k:l]. This is the tricky part, and I hope the following example helps understanding it.

EDIT : I replaced the numbers in data by letters to avoid confusion in the following example.

enter image description here

Note: the values in indptr are necessarily increasing, because the next cell in indptr (the next row) is referring to the next values in data and indices corresponding to that row.

like image 115
Tanguy Avatar answered Sep 19 '22 06:09

Tanguy