Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scipy sparse matrix: remove the rows whose all elements are zero

I have a sparse matrix which is transformed from sklearn tfidfVectorier. I believe that some rows are all-zero rows. I want to remove them. However, as far as I know, the existing built-in functions, e.g. nonzero() and eliminate_zero(), focus on zero entries, rather than rows.

Is there any easy way to remove all-zero rows from a sparse matrix?

Example: What I have now (actually in sparse format):

[ [0, 0, 0]
  [1, 0, 2]
  [0, 0, 1] ]

What I want to get:

[ [1, 0, 2]
  [0, 0, 1] ]
like image 883
Munichong Avatar asked Jul 02 '15 15:07

Munichong


1 Answers

Slicing + getnnz() does the trick:

M = M[M.getnnz(1)>0]

Works directly on csr_array. You can also remove all 0 columns without changing formats:

M = M[:,M.getnnz(0)>0]

However if you want to remove both you need

M = M[M.getnnz(1)>0][:,M.getnnz(0)>0] #GOOD

I am not sure why but

M = M[M.getnnz(1)>0, M.getnnz(0)>0] #BAD

does not work.

like image 68
Daniel Mahler Avatar answered Nov 03 '22 20:11

Daniel Mahler