What is the best way to efficiently remove columns from a sparse matrix that only contain zeros. I have a matrix which I have created and filled with data: <pre class="prettyprint"><code>matrix = sp.sparse.lil_matrix((100, 100)) </code></pre> I now wish to remove ~ the last 20 columns which only contain zero data. How can I do this?

If this were just a numpy array, <code>X</code>, then you could say <code>X!=0</code> which would give you a boolean array of the same shape as <code>X</code>, and then you could index <code>X</code> with the boolean array, i.e. <code>non_zero_entries = X[X!=0]</code> But this is a sparse matrix which does not support boolean indexing and also will not give you what you want if you try <code>X!=0</code> -- it just returns a single boolean value that seems to only return true if they are the exact same matrix (in memory). What you want is the <code>nonzero</code> method from numpy. <pre class="prettyprint"><code>import numpy as np from scipy import sparse X = sparse.lil_matrix((100,100)) # some sparse matrix X[1,17] = 1 X[17,17] = 1 indices = np.nonzero(X) # a tuple of two arrays: 0th is row indices, 1st is cols X.tocsc()[indices] # this just gives you the array of all non-zero entries </code></pre> If you want only the full columns where there are non-zero entries, then just take the 1st from indices. Except you need to account for the repeated indices (if there are more than one entries in a column): <pre class="prettyprint"><code>columns_non_unique = indices[1] unique_columns = sorted(set(columns_non_unique)) X.tocsc()[:,unique_columns] </code></pre>

This looks like the way, although not ideally efficient: <pre class="prettyprint"><code>matrix = matrix[0:100,0:80] </code></pre>

How to efficiently remove columns from a sparse matrix that only contain zeros?

Tags:

python

numpy

scipy

sparse-matrix

What is the best way to efficiently remove columns from a sparse matrix that only contain zeros. I have a matrix which I have created and filled with data:

matrix = sp.sparse.lil_matrix((100, 100))

I now wish to remove ~ the last 20 columns which only contain zero data. How can I do this?

636

asked May 19 '12 21:05

turtle

2 Answers

If this were just a numpy array, X, then you could say X!=0 which would give you a boolean array of the same shape as X, and then you could index X with the boolean array, i.e. non_zero_entries = X[X!=0]

But this is a sparse matrix which does not support boolean indexing and also will not give you what you want if you try X!=0 -- it just returns a single boolean value that seems to only return true if they are the exact same matrix (in memory).

What you want is the nonzero method from numpy.

import numpy as np
from scipy import sparse

X = sparse.lil_matrix((100,100)) # some sparse matrix
X[1,17] = 1
X[17,17] = 1
indices = np.nonzero(X) # a tuple of two arrays: 0th is row indices, 1st is cols
X.tocsc()[indices] # this just gives you the array of all non-zero entries

If you want only the full columns where there are non-zero entries, then just take the 1st from indices. Except you need to account for the repeated indices (if there are more than one entries in a column):

columns_non_unique = indices[1]
unique_columns = sorted(set(columns_non_unique))
X.tocsc()[:,unique_columns]

146

answered Nov 11 '22 02:11

gabe

This looks like the way, although not ideally efficient:

matrix = matrix[0:100,0:80]

answered Nov 11 '22 03:11

Hakan Serce

Related questions
                            
                                Python: Get local IP-Address used to send IP data to a specific remote IP-Address
                            
                                netbeans 7.1 and python
                            
                                Algorithm for placing a grid over a disordered set of points
                            
                                List of open browser tabs programmatically
                            
                                python read binary from specific position
                            
                                matplotlib window layout questions
                            
                                Removing duplicate elements from a Python list containing unhashable elements while preserving order?
                            
                                Subplots with dates on the x-axis
                            
                                sqlalchemy id equality vs reference equality
                            
                                Python 3 - non-copying stream interface to bytearray?
                            
                                Nearest Neighbor Search in Python without k-d tree
                            
                                How to use django-notification to inform a user when somebody comments on their post
                            
                                Task state and django-celery
                            
                                how to install python-devel for 2.6 version?
                            
                                Is it OK to execute code when a module imports?
                            
                                Using PostgreSQL array to store many-to-many relationship
                            
                                user authentication via ssl certs in django
                            
                                Decode base64 string in python 3 (with lxml or not)
                            
                                Minimax explanation "for dummies"
                            
                                How to copy a python bytearray buffer?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With