I'm attempting to apply the sweep function to a sparse matrix (dgCMatrix). Unfortunately, when I do that I get a memory error. It seems that sweep is expanding my sparse matrix to a full dense matrix.
If there an easy way to perform this function without if blowing up my memory?
This is what I'm trying to do.
sparse_matrix <- sweep(sparse_matrix, 1, vector_to_multiply, '*')
I'm working with a big and very sparse dgTMatrix matrix (200k rows and 10k columns) in a NLP problem. After hours thinking in a good solution, I created an alternative sweep function for sparse matrices. It is very fast and memory efficient. It took just 1 second and less than 1G of memory to multiply all matrix rows by a array of weights. For margin = 1 it works for both dgCMatrix and dgTMatrix.
Here it follows:
sweep_sparse <- function(x, margin, stats, fun = "*") {
f <- match.fun(fun)
if (margin == 1) {
idx <- x@i + 1
} else {
idx <- x@j + 1
}
x@x <- f(x@x, stats[idx])
return(x)
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With