The SciPy Sparse Matrix tutorial is very good -- but it actually leaves the section on slicing un(der)developed (still in outline form -- see section: "Handling Sparse Matrices").
I will try and update the tutorial, once this question is answered.
I have a large sparse matrix -- currently in dok_matrix format.
import numpy as np from scipy import sparse M = sparse.dok_matrix((10**6, 10**6))
For various methods I want to be able to slice columns and for others I want to slice rows. Ideally I would use advanced-indexing (i.e. a boolean vector, bool_vect
) with which to slice a sparse matrix M
-- as in:
bool_vect = np.arange(10**6)%2 # every even index out = M[bool_vect,:] # Want to select every even row
or
out = M[:,bool_vect] # Want to select every even column
First off, dok_matrices do not support this -- but I think it works (slowly) if I first cast to lil_matrices, via sparse.lil_matrix(M)
As far as I can gather from the tutorial -- to slice columns I want to use CSC and to slice rows I want to slice CSR. So does that mean I should cast the matrix M
via:
M.tocsc()[:,bool_vect]
or
M.tocsr()[bool_vect,:]
I am kinda guessing here and my code is slow because of it. Any help from someone who understands how this works would be appreciated. Thanks in advance.
If it turns out I should not be indexing my matrix with a boolean array, but rather a list of integers (indices) -- that is also something I would be happy to find out. Whichever is more efficient.
Finally -- this is a big matrix, so bonus points if this can happen in place / with broadcasting.
In a linked list representation, the linked list data structure is used to represent the sparse matrix. The advantage of using a linked list to represent the sparse matrix is that the complexity of inserting or deleting a node in a linked list is lesser than the array.
random function creates sparse matrix in COO format by default. However, COO matrix is not slice operations friendly. So we first convert the COO sparse matrix to CSR (Compressed Sparse Row format) matrix using tocsr() function. And then we can slice the sparse matrix rows using the row indices array we created.
Sparse Matrices in PythonSciPy provides tools for creating sparse matrices using multiple data structures, as well as tools for converting a dense matrix to a sparse matrix. Many linear algebra NumPy and SciPy functions that operate on NumPy arrays can transparently operate on SciPy sparse arrays.
Out of 24 elements of this Sparse Matrix, only 10 elements are non-zero. These non-zero elements are: A [1,1] = 5. A [1,4] = 24.
Ok, so I'm pretty sure the "right" way to do this is: if you are slicing columns, use tocsc() and slice using a list/array of integers. Boolean vectors does not seem to do the trick with sparse matrices -- the way it does with ndarrays in numpy. Which means the answer is.
indices = np.where(bool_vect)[0] out1 = M.tocsc()[:,indices] out2 = M.tocsr()[indices,:]
But question: is this the best way? Is this in place?
In practice this does seem to be happening in place -- and it is much faster than prior attempts (using lil_matrix).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With