What is the best way to efficiently remove columns from a sparse matrix that only contain zeros. I have a matrix which I have created and filled with data:
matrix = sp.sparse.lil_matrix((100, 100))
I now wish to remove ~ the last 20 columns which only contain zero data. How can I do this?
You could then convert your sparse array to CSC format, and the exact same trick will get rid of the all zero columns then. also works. Your first solution depends on the array being prunned. Depending on why are there 0s, it may need eliminate_zeros() to match the results of the second method.
To remove the rows of 0 , you can: sum the absolute value of each rows (to avoid having a zero sum from a mix of negative and positive numbers), which gives you a column vector of the row sums. keep the index of each line where the sum is non-zero.
To check whether a matrix is a sparse matrix, we only need to check the total number of elements that are equal to zero.
If this were just a numpy array, X
, then you could say X!=0
which would give you a boolean array of the same shape as X
, and then you could index X
with the boolean array, i.e. non_zero_entries = X[X!=0]
But this is a sparse matrix which does not support boolean indexing and also will not give you what you want if you try X!=0
-- it just returns a single boolean value that seems to only return true if they are the exact same matrix (in memory).
What you want is the nonzero
method from numpy.
import numpy as np
from scipy import sparse
X = sparse.lil_matrix((100,100)) # some sparse matrix
X[1,17] = 1
X[17,17] = 1
indices = np.nonzero(X) # a tuple of two arrays: 0th is row indices, 1st is cols
X.tocsc()[indices] # this just gives you the array of all non-zero entries
If you want only the full columns where there are non-zero entries, then just take the 1st from indices. Except you need to account for the repeated indices (if there are more than one entries in a column):
columns_non_unique = indices[1]
unique_columns = sorted(set(columns_non_unique))
X.tocsc()[:,unique_columns]
This looks like the way, although not ideally efficient:
matrix = matrix[0:100,0:80]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With