Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linear regression with big matrices

Tags:

r

r-bigmemory

I would like to perform a linear regression with big matrices.

This is what I have tried so far:

library(bigmemory)
library(biganalytics)
library(bigalgebra)

nrows <- 1000000
X <- as.big.matrix( replicate(100, rnorm(nrows)) )
y <- rnorm(nrows)

biglm.big.matrix(y ~ X)
# Error in CreateNextDataFrameGenerator(formula, data, chunksize, fc, getNextChunkFunc,  : 
  argument "data" is missing, with no default

biglm.big.matrix(y ~ X, data = cbind(y, X))
# Error in bigmemory:::mmap(vars, colnames(data)) : 
  Couldn't find a match to one of the arguments.

biglm.big.matrix(y ~ X, data = cbind(y = y, X = X))
# Error in bigmemory:::mmap(vars, colnames(data)) : 
  Couldn't find a match to one of the arguments.

How can I solve this problem?

like image 603
mat Avatar asked Oct 19 '25 14:10

mat


1 Answers

Here, X is a (big) matrix with 100 columns. Since biglm.big.matrix() requires the data= argument, it looks like you can't ask that function to run a linear model on all columns in X at once like you can with lm(). Note also that when you cbind() a with a big.matrix, as in cbind(y, X), the result is a list!!.

It appears you need both y and X to be part of one big.matrix, then you will need to build the model formula yourself manually:

library(bigmemory)
library(biganalytics)
library(bigalgebra)

# Construct an empty big.matrix with the correct number of dimensions and
# with column names
nrows <- 1000000
dat <- big.matrix(nrow=nrows, ncol=101, 
                  dimnames=list(
                    NULL, # no rownames
                    c("y", paste0("X", 1:ncol(X))) # colnames: y, X1, X2, ..., X100
                  ))

# fill with y and X:
dat[,1] <- rnorm(nrows)
dat[,2:101] <- replicate(100, rnorm(nrows)) 

# construct the model formula as a character vector using paste:
# (Or you need to type y ~ X1 + X2 + ... + X100 manually in biglm.big.matrix()!)
f <- paste("y ~", paste(colnames(dat)[-1], collapse=" + "))

# run the model
res <- biglm.big.matrix(as.formula(f), data=dat)
summary(res)
like image 128
Scott Ritchie Avatar answered Oct 22 '25 04:10

Scott Ritchie



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!