Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R package Matrix: get number of non-zero entries per rows / columns of a sparse matrix

I have a large sparse matrix ("dgCMatrix", dimension 5e+5 x 1e+6). I need to count for each column how many non-zero values there are and make a list of column names with only 1 non-zero entry.

My code works for small matrices, but becomes too computationally intensive for the actual matrix I need to work on.

library(Matrix)
set.seed(0)
mat <- Matrix(matrix(rbinom(200, 1, 0.10), ncol = 20))
colnames(mat) <- letters[1:20]

entries <- colnames(mat[, nrow(mat) - colSums(mat == 0) == 1])

Any suggestion is very welcome!

like image 527
MCS Avatar asked Oct 18 '25 23:10

MCS


2 Answers

I have a large sparse matrix ("dgCMatrix")

Let us call it dgCMat.

I need to count for each column how many non-zero values there are

xx <- diff(dgCMat@p)

and make a list of column names with only 1 non-zero entry

colnames(dgCMat)[xx == 1]

summary

nnz: number of non-zeros

For a "dgCMatrix" dgCMat:

## nnz per column
diff(dgCMat@p)

## nnz per row
tabulate(dgCMat@i + 1)

For a "dgRMatrix" dgRMat:

## nnz per column
tabulate(dgRMat@j + 1)

## nnz per row
diff(dgRMat@p)

For a "dgTMatrix" dgTMat:

## nnz per column
tabulate(dgTMat@j + 1)

## nnz per row
tabulate(dgTMat@i + 1)

I did not read your original code when posting this answer. So I did not know that you got stuck with the use of mat == 0. Only till later I added the difference between mat == 0 and mat != 0 in your answer.

Your workaround using mat != 0 well exploits the package's feature. That same line of code should work with other sparse matrix classes, too. Mine goes straight to the internal storage, hence different versions are required for different classes.

like image 61
Zheyuan Li Avatar answered Oct 20 '25 13:10

Zheyuan Li


Similar results are produced using the following: Please notice the provided comments:

## `mat != 0` returns a "lgCMatrix" which is sparse
## don't try `mat == 0` as that is dense, simply because there are too many zeros
entries <- colnames(mat)[colSums(mat != 0) == 1]
like image 32
MCS Avatar answered Oct 20 '25 12:10

MCS



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!