Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Moore-Penrose generalized inverse of a large sparse matrix

I have a square matrix with a few tens of thousands rows and columns with only a few 1 beside tons of 0, so I use the Matrix package to store that in R in an efficient way. The base::matrix object cannot handle that amount of cells, as I run out of memory.

My problem is that I need the inverse and also the Moore-Penrose generalized inverse of such matrices, that I cannot compute at the time being.

What I have tried:

  • solve yields an Error in LU.dgC(a) : cs_lu(A) failed: near-singular A (or out of memory) error
  • MASS::ginv is not compatible with the Matrix class
  • there is no direct way to convert a sparse Matrix to e.g. bigmemory::big.matrix, and that latter neither does work with MASS::ginv anyway
  • if I try to compute the Choleski factorization of the matrix to later call Matrix::chol2inv on that, I get the following error message:

    Error in .local(x, ...) : 
      internal_chm_factor: Cholesky factorization failed
    In addition: Warning message:
    In .local(x, ...) :
      Cholmod warning 'not positive definite' at file ../Cholesky/t_cholmod_rowfac.c, line 431
    
  • based on a related question, I also gave a try to the pbdDMAT package on a single node, but pbdDMAT::chol yielded an Cholmod error 'out of memory' at file ../Core/cholmod_memory.c, line 147 error message

Question in a nutshell: is there any way to compute the inverse and Moore-Penrose generalized inverse of such sparse matrix, beside falling back to using matrix class on a computer with tons of RAM?


Quick reproducible example:

library(Matrix)
n <- 1e5
m <- sparseMatrix(1:n, 1:n, x = 1)
m <- do.call(rBind, lapply(1:10, function(i) {
    set.seed(i)
    m[-sample(1:n, n/3), ]
}))
m <- t(m) %*% m

Some descriptives (thanks to Gabor Grothendieck):

> dim(m)
[1] 100000 100000
> sum(m) / prod(dim(m))
[1] 6.6667e-05
> table(rowSums(m))

    0     1     2     3     4     5     6     7     8     9    10 
    5    28   320  1622  5721 13563 22779 26011 19574  8676  1701 
> table(colSums(m))

    0     1     2     3     4     5     6     7     8     9    10 
    5    28   320  1622  5721 13563 22779 26011 19574  8676  1701 

And some error messages:

> Matrix::solve(m)
Error in LU.dgC(a) : cs_lu(A) failed: near-singular A (or out of memory)
> base::solve(m)
Error in asMethod(object) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
> MASS::ginv(m)
Error in MASS::ginv(m) : 'X' must be a numeric or complex matrix

The goal of the bounty is to find a way to compute the Moore-Penrose generalized inverse of m with less than 8Gb of RAM (speed and performance is not important).

like image 921
daroczig Avatar asked May 23 '14 09:05

daroczig


People also ask

What will be the Moore-Penrose pseudoinverse matrix of A?

) of a matrix A is a generalization of the inverse matrix. The most common use of pseudoinverse is to compute the best fit solution to a system of linear equations which lacks a unique solution. Moore – Penrose inverse is the most widely known type of matrix pseudoinverse.

Does Moore-Penrose inverse always exist?

Existence and uniquenessGeneralized inverses always exist but are not in general unique. Uniqueness is a consequence of the last two conditions.

Is the inverse of a sparse matrix sparse?

The inverse of square sparse matrices is not sparse. You can find examples in the field of graph theory.

How do you find the Moore-Penrose inverse of a matrix?

Summarizing, to find the Moore-Penrose inverse of a matrix A: Find the Singular Value Decomposition: A=UΣV∗ (using R or Python, if you like). Find Σ+ by transposing Σ and taking the reciprocal of all its non-zero diagonal entries. Compute A+=VΣ+U∗


1 Answers

If you have very few 1's then your matrix will likely have no more than one 1 in any column and in any row in which case the generalized inverse equals the transpose:

library(Matrix)
set.seed(123)
n <- 1e5
m <- sparseMatrix(sample(1:n, n/10), sample(1:n, n/10), x = 1, dims = c(n, n))

##############################################################################
# confirm that m has no more than one 1 in each row and column
##############################################################################
table(rowSums(m))  # here 90,000 rows are all zero, 10,000 have a single one
##     0     1 
## 90000 10000 

table(colSums(m))  # here 90,000 cols are all zero, 10,000 have a single one
##     0     1 
## 90000 10000 

##############################################################################
# calculate generalized inverse
##############################################################################
minv <- t(m)

##############################################################################
# check that when multiplied by minv it gives a diagonal matrix of 0's and 1's
##############################################################################
mm <- m %*% minv

table(diag(mm)) # diagonal has 90,000 zeros and 10,000 ones
##     0     1 
## 90000 10000 

diag(mm) <- 0
range(mm) # off diagonals are all zero
## [1] 0 0

REVISED Improved presentation.

like image 66
G. Grothendieck Avatar answered Oct 17 '22 12:10

G. Grothendieck