Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to run bigglm function for large number of variables

In ffbase (http://cran.r-project.org/web/packages/ffbase/ffbase.pdf) there is the bigglm function:

bigglm.ffdf(formula, data, family = gaussian(), ...,

where formula is something like Y~X, assuming Y and X correspond to the colnames of ffdf object called data.

What if I have 200 columns in data that I want to put on the RHS of the equation? Clearly I can't type Y~X1+X2+....+X200.

How do I run Y~X1+X2+....+X200 without typing out all 200 variables on the RHS?

like image 312
user2763361 Avatar asked Feb 15 '23 17:02

user2763361


2 Answers

the . symbol is the normal character for this, not sure if it works with ffbase though. I.e.

m <- lm(y ~ ., df)

will describe y by all other columns in df.

As described by Chris, this appears to be a bug in biglm, and can be worked around by using:

m <- bigglm(terms(y ~ ., data=df), data=df)

But this should be reported as a bug to the author of biglm.

like image 66
Sam Mason Avatar answered Feb 20 '23 08:02

Sam Mason


If Sam's answer doesn't work, you can build up a character string representing the formula and then cast is as a formula:

formula <- as.formula(paste('Y', paste(paste('', 
       paste('X', 1:200, sep = ''), sep = '', collapse = ' + ')), sep = ' ~ '))

The inner paste creates X1 to X200. The next paste collapses the resulting vector into a single string with the elements of the first paste put together with +'s. The last paste adds on the Y ~. Finally, I change it from a string to a formula.

like image 24
Christopher Louden Avatar answered Feb 20 '23 09:02

Christopher Louden