The SciPy Sparse Matrix tutorial is very good -- but it actually leaves the section on slicing un(der)developed (still in outline form -- see section: "Handling Sparse Matrices"). I will try and update the tutorial, once this question is answered. I have a large sparse matrix -- currently in dok_matrix format. <pre class="prettyprint"><code>import numpy as np from scipy import sparse M = sparse.dok_matrix((10**6, 10**6)) </code></pre> For various methods I want to be able to slice columns and for others I want to slice rows. Ideally I would use advanced-indexing (i.e. a boolean vector, <code>bool_vect</code>) with which to slice a sparse matrix <code>M</code> -- as in: <pre class="prettyprint"><code>bool_vect = np.arange(10**6)%2 # every even index out = M[bool_vect,:] # Want to select every even row </code></pre> or <pre class="prettyprint"><code>out = M[:,bool_vect] # Want to select every even column </code></pre> First off, dok_matrices do not support this -- but I think it works (slowly) if I first cast to lil_matrices, via <code>sparse.lil_matrix(M)</code> As far as I can gather from the tutorial -- to slice columns I want to use CSC and to slice rows I want to slice CSR. So does that mean I should cast the matrix <code>M</code> via: <pre class="prettyprint"><code>M.tocsc()[:,bool_vect] </code></pre> or <pre class="prettyprint"><code>M.tocsr()[bool_vect,:] </code></pre> I am kinda guessing here and my code is slow because of it. Any help from someone who understands how this works would be appreciated. Thanks in advance. If it turns out I should not be indexing my matrix with a boolean array, but rather a list of integers (indices) -- that is also something I would be happy to find out. Whichever is more efficient. Finally -- this is a big matrix, so bonus points if this can happen in place / with broadcasting.

Ok, so I'm pretty sure the "right" way to do this is: if you are slicing columns, use tocsc() and slice using a list/array of integers. Boolean vectors does not seem to do the trick with sparse matrices -- the way it does with ndarrays in numpy. Which means the answer is. <pre class="prettyprint"><code>indices = np.where(bool_vect)[0] out1 = M.tocsc()[:,indices] out2 = M.tocsr()[indices,:] </code></pre> But question: is this the best way? Is this in place? In practice this does seem to be happening in place -- and it is much faster than prior attempts (using lil_matrix).

Slicing Sparse Matrices in Scipy -- Which Types Work Best?

Tags:

python

slice

indexing

scipy

sparse-matrix

The SciPy Sparse Matrix tutorial is very good -- but it actually leaves the section on slicing un(der)developed (still in outline form -- see section: "Handling Sparse Matrices").

I will try and update the tutorial, once this question is answered.

I have a large sparse matrix -- currently in dok_matrix format.

import numpy as np from scipy import sparse M = sparse.dok_matrix((10**6, 10**6))

For various methods I want to be able to slice columns and for others I want to slice rows. Ideally I would use advanced-indexing (i.e. a boolean vector, bool_vect) with which to slice a sparse matrix M -- as in:

bool_vect = np.arange(10**6)%2  # every even index out = M[bool_vect,:]            # Want to select every even row

out = M[:,bool_vect] # Want to select every even column

First off, dok_matrices do not support this -- but I think it works (slowly) if I first cast to lil_matrices, via sparse.lil_matrix(M)

As far as I can gather from the tutorial -- to slice columns I want to use CSC and to slice rows I want to slice CSR. So does that mean I should cast the matrix M via:

M.tocsc()[:,bool_vect]

M.tocsr()[bool_vect,:]

I am kinda guessing here and my code is slow because of it. Any help from someone who understands how this works would be appreciated. Thanks in advance.

If it turns out I should not be indexing my matrix with a boolean array, but rather a list of integers (indices) -- that is also something I would be happy to find out. Whichever is more efficient.

Finally -- this is a big matrix, so bonus points if this can happen in place / with broadcasting.

346

asked Nov 12 '12 21:11

gabe

1 Answers

Ok, so I'm pretty sure the "right" way to do this is: if you are slicing columns, use tocsc() and slice using a list/array of integers. Boolean vectors does not seem to do the trick with sparse matrices -- the way it does with ndarrays in numpy. Which means the answer is.

indices = np.where(bool_vect)[0] out1 = M.tocsc()[:,indices] out2 = M.tocsr()[indices,:]

But question: is this the best way? Is this in place?

In practice this does seem to be happening in place -- and it is much faster than prior attempts (using lil_matrix).

132

answered Sep 23 '22 08:09

gabe

Related questions
                            
                                Mock Patches Appearing in the Wrong Order?
                            
                                setUpClass() missing 1 required positional argument: 'cls'
                            
                                `Building wheel for opencv-python (PEP 517) ... -` runs forever
                            
                                Resize image in Python without losing EXIF data
                            
                                Are there benefits to running X86-64 Python on a 64-bit CPU in a 64-bit OS?
                            
                                How can I get hours from a Python datetime?
                            
                                writing data from a python list to csv row-wise
                            
                                python json dumps
                            
                                matplotlib: Aligning y-axis labels in stacked scatter plots
                            
                                Factor Loadings using sklearn
                            
                                Difference between using commas, concatenation, and string formatters in Python
                            
                                'Finally' equivalent for If/Elif statements in Python
                            
                                SQLAlchemy Automap does not create class for tables without primary key
                            
                                How to define global function in Python?
                            
                                Filling missing values using forward and backward fill in pandas dataframe (ffill and bfill)
                            
                                error: Failed to load the native TensorFlow runtime
                            
                                Group by consecutive index numbers
                            
                                How do you break into the debugger from Python source code?
                            
                                Find a specific tag with BeautifulSoup
                            
                                What happened to thread.start_new_thread in python 3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Slicing Sparse Matrices in Scipy -- Which Types Work Best?

Tags:

python

slice

indexing

scipy

sparse-matrix

gabe

People also ask

1 Answers

gabe

Recent Activity

Donate For Us