I have a square matrix with a few tens of thousands rows and columns with only a few 1 beside tons of 0, so I use the Matrix package to store that in R in an efficient way. The base::matrix object cannot handle that amount of cells, as I run out of memory.
My problem is that I need the inverse and also the Moore-Penrose generalized inverse of such matrices, that I cannot compute at the time being.
What I have tried:
solve yields an Error in LU.dgC(a) : cs_lu(A) failed: near-singular A (or out of memory) errorMASS::ginv is not compatible with the Matrix classMatrix to e.g. bigmemory::big.matrix, and that latter neither does work with MASS::ginv anywayif I try to compute the Choleski factorization of the matrix to later call Matrix::chol2inv on that, I get the following error message:
Error in .local(x, ...) : 
  internal_chm_factor: Cholesky factorization failed
In addition: Warning message:
In .local(x, ...) :
  Cholmod warning 'not positive definite' at file ../Cholesky/t_cholmod_rowfac.c, line 431
based on a related question, I also gave a try to the pbdDMAT package on a single node, but pbdDMAT::chol yielded an Cholmod error 'out of memory' at file ../Core/cholmod_memory.c, line 147 error message
Question in a nutshell: is there any way to compute the inverse and Moore-Penrose generalized inverse of such sparse matrix, beside falling back to using matrix class on a computer with tons of RAM?
Quick reproducible example:
library(Matrix)
n <- 1e5
m <- sparseMatrix(1:n, 1:n, x = 1)
m <- do.call(rBind, lapply(1:10, function(i) {
    set.seed(i)
    m[-sample(1:n, n/3), ]
}))
m <- t(m) %*% m
Some descriptives (thanks to Gabor Grothendieck):
> dim(m)
[1] 100000 100000
> sum(m) / prod(dim(m))
[1] 6.6667e-05
> table(rowSums(m))
    0     1     2     3     4     5     6     7     8     9    10 
    5    28   320  1622  5721 13563 22779 26011 19574  8676  1701 
> table(colSums(m))
    0     1     2     3     4     5     6     7     8     9    10 
    5    28   320  1622  5721 13563 22779 26011 19574  8676  1701 
And some error messages:
> Matrix::solve(m)
Error in LU.dgC(a) : cs_lu(A) failed: near-singular A (or out of memory)
> base::solve(m)
Error in asMethod(object) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
> MASS::ginv(m)
Error in MASS::ginv(m) : 'X' must be a numeric or complex matrix
The goal of the bounty is to find a way to compute the Moore-Penrose generalized inverse of m with less than 8Gb of RAM (speed and performance is not important).
) of a matrix A is a generalization of the inverse matrix. The most common use of pseudoinverse is to compute the best fit solution to a system of linear equations which lacks a unique solution. Moore – Penrose inverse is the most widely known type of matrix pseudoinverse.
Existence and uniquenessGeneralized inverses always exist but are not in general unique. Uniqueness is a consequence of the last two conditions.
The inverse of square sparse matrices is not sparse. You can find examples in the field of graph theory.
Summarizing, to find the Moore-Penrose inverse of a matrix A: Find the Singular Value Decomposition: A=UΣV∗ (using R or Python, if you like). Find Σ+ by transposing Σ and taking the reciprocal of all its non-zero diagonal entries. Compute A+=VΣ+U∗
If you have very few 1's then your matrix will likely have no more than one 1 in any column and in any row in which case the generalized inverse equals the transpose:
library(Matrix)
set.seed(123)
n <- 1e5
m <- sparseMatrix(sample(1:n, n/10), sample(1:n, n/10), x = 1, dims = c(n, n))
##############################################################################
# confirm that m has no more than one 1 in each row and column
##############################################################################
table(rowSums(m))  # here 90,000 rows are all zero, 10,000 have a single one
##     0     1 
## 90000 10000 
table(colSums(m))  # here 90,000 cols are all zero, 10,000 have a single one
##     0     1 
## 90000 10000 
##############################################################################
# calculate generalized inverse
##############################################################################
minv <- t(m)
##############################################################################
# check that when multiplied by minv it gives a diagonal matrix of 0's and 1's
##############################################################################
mm <- m %*% minv
table(diag(mm)) # diagonal has 90,000 zeros and 10,000 ones
##     0     1 
## 90000 10000 
diag(mm) <- 0
range(mm) # off diagonals are all zero
## [1] 0 0
REVISED Improved presentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With