I have a sparse matrix which is transformed from sklearn tfidfVectorier. I believe that some rows are all-zero rows. I want to remove them. However, as far as I know, the existing built-in functions, e.g. nonzero() and eliminate_zero(), focus on zero entries, rather than rows.
Is there any easy way to remove all-zero rows from a sparse matrix?
Example: What I have now (actually in sparse format):
[ [0, 0, 0]
[1, 0, 2]
[0, 0, 1] ]
What I want to get:
[ [1, 0, 2]
[0, 0, 1] ]
Slicing + getnnz()
does the trick:
M = M[M.getnnz(1)>0]
Works directly on csr_array
.
You can also remove all 0 columns without changing formats:
M = M[:,M.getnnz(0)>0]
However if you want to remove both you need
M = M[M.getnnz(1)>0][:,M.getnnz(0)>0] #GOOD
I am not sure why but
M = M[M.getnnz(1)>0, M.getnnz(0)>0] #BAD
does not work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With