Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create sparse matrix with more than 2^31-1 non-zero elements [duplicate]

I use some C++ code to take a text file from a database and create a dgcMatrix type sparse matrix from the Matrix package. For the first time, I'm trying to build a matrix that has more than 2^31-1 non-sparse members, which means that the index vector in the sparse matrix object must also be longer than that limit. Unfortunately, vectors seem to use 32-bit integer indices, as do NumericVectors in Rcpp.

Short of writing an entire new data type from the ground up, does R provide any facility for this? I don't think I can use too exotic a solution as I need glmnet to recognize the resultant object.

like image 514
Patrick McCarthy Avatar asked Jun 16 '14 03:06

Patrick McCarthy


People also ask

How do you find non zero entries in a sparse matrix?

Use nonzeros , nnz , and find to locate and count nonzero matrix elements. Create a 10-by-10 random sparse matrix with 7% density of nonzeros. A = sprand(10,10,0.07); Use nonzeros to find the values of the nonzero elements.

How many nonzero entries are there in the sparse matrix?

Density of Sparse Matrix Create a sparse matrix representing the finite difference Laplacian on an L-shaped domain and calculate its density. The result indicates that only about 2% of the elements in the matrix are nonzero.

How do you create a sparse matrix?

Description. S = sparse( A ) converts a full matrix into sparse form by squeezing out any zero elements. If a matrix contains many zeros, converting the matrix to sparse storage saves memory. S = sparse( m,n ) generates an m -by- n all zero sparse matrix.


2 Answers

The sparse matrix algebra R package spam with its spam64 extension supports sparse matrices with more than 2^31-1 non-zero elements.

A simple example (requires ~50 Gb memory and takes ~5 mins to run):

## -- a regular 32-bit spam matrix
library(spam) # version 2.2-2
s <- spam(1:2^30)
summary(s) 
## Matrix object of class 'spam' of dimension 1073741824x1,
##     with 1073741824 (row-wise) nonzero elements.
##     Density of the matrix is 100%.
## Class 'spam'

## -- a 64-bit spam matrix with 2^31 non-zero entries
library(spam64)
s <- cbind(s, s) 
summary(s) 
## Matrix object of class 'spam' of dimension 1073741824x2,
##     with 2147483648 (row-wise) nonzero elements.
##     Density of the matrix is 100%.
## Class 'spam'

## -- add zeros to make the dimension 2^31 x 2^31
pad(s) <- c(2^31, 2^31) 
summary(s) 
## Matrix object of class 'spam' of dimension 2147483648x2147483648,
##     with 2147483648 (row-wise) nonzero elements.
##     Density of the matrix is 4.66e-08%.
## Class 'spam'

The implementation is based on the .C64() R interface to compiled code available in dotCall64.

Note: Not all function of spam support 64-bit matrices (yet).

Some links:

  • https://cran.r-project.org/package=spam
  • https://cran.r-project.org/package=spam64
  • https://cran.r-project.org/package=dotCall64
  • https://doi.org/10.1016/j.cageo.2016.11.015
  • https://doi.org/10.1016/j.softx.2018.06.002

I am one of the authors of dotCall64 and spam.

like image 126
Nairolf Avatar answered Sep 30 '22 17:09

Nairolf


In recent versions of R, vectors are indexed by the R_xlen_t type, which is 64 bits on 64 bits platforms and just int on 32 bit platforms.

Rcpp so far still uses int everywhere. I would encourage you to request the feature on their issue list. It is not hard, but needs systematic involvement of someone with skills, time and willingness. The development version of Rcpp11 uses the correct type, perhaps they can use that as a model.

Note however that even though R uses 64 bit unsigned integers on 64 bit plaforms, you are in fact limited to the range that can be handled by the double type, which is what R will give you if you ask for the length of a vector. R has no 64 bit integer type that it can represent natively, so when you ask for the length of a vector you either get an int or a double depending on the value.

like image 36
Romain Francois Avatar answered Sep 29 '22 17:09

Romain Francois