Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently remove columns from a sparse matrix that only contain zeros?

What is the best way to efficiently remove columns from a sparse matrix that only contain zeros. I have a matrix which I have created and filled with data:

matrix = sp.sparse.lil_matrix((100, 100))

I now wish to remove ~ the last 20 columns which only contain zero data. How can I do this?

like image 636
turtle Avatar asked May 19 '12 21:05

turtle


People also ask

How do you remove zeros from a sparse matrix?

You could then convert your sparse array to CSC format, and the exact same trick will get rid of the all zero columns then. also works. Your first solution depends on the array being prunned. Depending on why are there 0s, it may need eliminate_zeros() to match the results of the second method.

Can I remove the zero row in Matrix?

To remove the rows of 0 , you can: sum the absolute value of each rows (to avoid having a zero sum from a mix of negative and positive numbers), which gives you a column vector of the row sums. keep the index of each line where the sum is non-zero.

How do you find the sparse matrix?

To check whether a matrix is a sparse matrix, we only need to check the total number of elements that are equal to zero.


2 Answers

If this were just a numpy array, X, then you could say X!=0 which would give you a boolean array of the same shape as X, and then you could index X with the boolean array, i.e. non_zero_entries = X[X!=0]

But this is a sparse matrix which does not support boolean indexing and also will not give you what you want if you try X!=0 -- it just returns a single boolean value that seems to only return true if they are the exact same matrix (in memory).

What you want is the nonzero method from numpy.

import numpy as np
from scipy import sparse

X = sparse.lil_matrix((100,100)) # some sparse matrix
X[1,17] = 1
X[17,17] = 1
indices = np.nonzero(X) # a tuple of two arrays: 0th is row indices, 1st is cols
X.tocsc()[indices] # this just gives you the array of all non-zero entries

If you want only the full columns where there are non-zero entries, then just take the 1st from indices. Except you need to account for the repeated indices (if there are more than one entries in a column):

columns_non_unique = indices[1]
unique_columns = sorted(set(columns_non_unique))
X.tocsc()[:,unique_columns]
like image 146
gabe Avatar answered Nov 11 '22 02:11

gabe


This looks like the way, although not ideally efficient:

matrix = matrix[0:100,0:80]
like image 1
Hakan Serce Avatar answered Nov 11 '22 03:11

Hakan Serce