Moore-Penrose generalized inverse of a large sparse matrix

Tags:

I have a square matrix with a few tens of thousands rows and columns with only a few 1 beside tons of 0, so I use the Matrix package to store that in R in an efficient way. The base::matrix object cannot handle that amount of cells, as I run out of memory.

My problem is that I need the inverse and also the Moore-Penrose generalized inverse of such matrices, that I cannot compute at the time being.

What I have tried:

solve yields an Error in LU.dgC(a) : cs_lu(A) failed: near-singular A (or out of memory) error
MASS::ginv is not compatible with the Matrix class
there is no direct way to convert a sparse Matrix to e.g. bigmemory::big.matrix, and that latter neither does work with MASS::ginv anyway

if I try to compute the Choleski factorization of the matrix to later call Matrix::chol2inv on that, I get the following error message:

Error in .local(x, ...) : 
  internal_chm_factor: Cholesky factorization failed
In addition: Warning message:
In .local(x, ...) :
  Cholmod warning 'not positive definite' at file ../Cholesky/t_cholmod_rowfac.c, line 431

based on a related question, I also gave a try to the pbdDMAT package on a single node, but pbdDMAT::chol yielded an Cholmod error 'out of memory' at file ../Core/cholmod_memory.c, line 147 error message

Question in a nutshell: is there any way to compute the inverse and Moore-Penrose generalized inverse of such sparse matrix, beside falling back to using matrix class on a computer with tons of RAM?

Quick reproducible example:

library(Matrix)
n <- 1e5
m <- sparseMatrix(1:n, 1:n, x = 1)
m <- do.call(rBind, lapply(1:10, function(i) {
    set.seed(i)
    m[-sample(1:n, n/3), ]
}))
m <- t(m) %*% m

Some descriptives (thanks to Gabor Grothendieck):

> dim(m)
[1] 100000 100000
> sum(m) / prod(dim(m))
[1] 6.6667e-05
> table(rowSums(m))

    0     1     2     3     4     5     6     7     8     9    10 
    5    28   320  1622  5721 13563 22779 26011 19574  8676  1701 
> table(colSums(m))

    0     1     2     3     4     5     6     7     8     9    10 
    5    28   320  1622  5721 13563 22779 26011 19574  8676  1701

And some error messages:

> Matrix::solve(m)
Error in LU.dgC(a) : cs_lu(A) failed: near-singular A (or out of memory)
> base::solve(m)
Error in asMethod(object) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
> MASS::ginv(m)
Error in MASS::ginv(m) : 'X' must be a numeric or complex matrix

The goal of the bounty is to find a way to compute the Moore-Penrose generalized inverse of m with less than 8Gb of RAM (speed and performance is not important).

921

asked May 23 '14 09:05

daroczig

1 Answers

If you have very few 1's then your matrix will likely have no more than one 1 in any column and in any row in which case the generalized inverse equals the transpose:

library(Matrix)
set.seed(123)
n <- 1e5
m <- sparseMatrix(sample(1:n, n/10), sample(1:n, n/10), x = 1, dims = c(n, n))

##############################################################################
# confirm that m has no more than one 1 in each row and column
##############################################################################
table(rowSums(m))  # here 90,000 rows are all zero, 10,000 have a single one
##     0     1 
## 90000 10000 

table(colSums(m))  # here 90,000 cols are all zero, 10,000 have a single one
##     0     1 
## 90000 10000 

##############################################################################
# calculate generalized inverse
##############################################################################
minv <- t(m)

##############################################################################
# check that when multiplied by minv it gives a diagonal matrix of 0's and 1's
##############################################################################
mm <- m %*% minv

table(diag(mm)) # diagonal has 90,000 zeros and 10,000 ones
##     0     1 
## 90000 10000 

diag(mm) <- 0
range(mm) # off diagonals are all zero
## [1] 0 0

REVISED Improved presentation.

answered Oct 17 '22 12:10

G. Grothendieck

Related questions
                            
                                automatically adjust R plot size in a PDF output
                            
                                Should I use mget(), .. or with=FALSE to select columns of a data.table?
                            
                                Efficiently getting older versions of R packages
                            
                                weight equivalent for geom_density2d
                            
                                Preview development R documentation
                            
                                Convert data from data.table to matrix efficiently (speed and memory)
                            
                                R tm package used for predictive analytics. How one classifies a new document?
                            
                                slidify - how to position an image?
                            
                                Rcpp can't find Rtools: "Error 1 occurred building shared library"
                            
                                Is there a way to add "author" metadata to a pdf created from R
                            
                                Error in RStudioGD() : Shadow graphics device error: r error 4 (R code execution error)
                            
                                Display graphics in plots window pane rather than opening new window in RStudio
                            
                                integrating R with Rsruby
                            
                                R YaleToolkit: How to change the font size of tick labels on the sparklines?
                            
                                Pushing to Github from RStudio [closed]
                            
                                gbm::interact.gbm vs. dismo::gbm.interactions
                            
                                Highlight Borders When Mouseover Fill Area - Leaflet-R
                            
                                R Studio 1.1.456 (Windows): Why do I sometimes have to install binary packages instead of installing from the source?
                            
                                How to debug an R package (with C code) in Emacs using GDB?
                            
                                Why does glm.nb throw a "missing value" error only on very specific inputs

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Moore-Penrose generalized inverse of a large sparse matrix

Tags:

r

large-data

linear-algebra

sparse-matrix

matrix-inverse

daroczig

People also ask

1 Answers

G. Grothendieck

Recent Activity

Donate For Us