Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Run regression in data.table

Tags:

r

data.table

Please see the fake data set.

library(data.table)
library(MASS)
n=5000
DT = data.table(
      grp=1:n,
      name=as.character(as.hexmode(1:n)), 
      x= sample(c(1:400),n,replace = TRUE)
    )

setkey(DT,grp)

UIDlist <- unique(DT[,grp])
IDnamelist <- paste0("V", 1 : length(UIDlist), sep = "")
test <- DT[, (IDnamelist):=lapply(UIDlist,function(x) grp ==x)][, V5000:= NULL]

I have a data.table, in which there're 4 columns, "grp", "Name", "x", "y". And then I add dummy on each level in "grp". Then I need to run the regression using glm.nb in MASS package.

First I tried this

SumResult <- glm.nb(x ~ factor(uid), data = test) 

But when adding dummies, we must notice that when there're N levels in "grp", we add N-1 dummies. So this method is not appropriate as far as I think.

So I tried this:

SumResult <- glm.nb( x ~ V1 + V2 + V3 + V4 + .....+ V4999  , data = test)

It's stupid to write all of the V1, V2, ... V4999 to do the regression.

Is there code can achieve the purpose?

Thanks

like image 482
Bigchao Avatar asked Jan 26 '14 13:01

Bigchao


People also ask

How do you display regression results in a table?

Still, in presenting the results for any multiple regression equation, it should always be clear from the table: (1) what the dependent variable is; (2) what the independent variables are; (3) the values of the partial slope coefficients (either unstandardized, standardized, or both); and (4) the details of any test of ...

How do you run a regression analysis?

To run the regression, arrange your data in columns as seen below. Click on the “Data” menu, and then choose the “Data Analysis” tab. You will now see a window listing the various statistical tests that Excel can perform. Scroll down to find the regression option and click “OK”.


1 Answers

You can try to create your formula object by string manipulation

formula <- as.formula(paste0("x ~ ", paste(names(test)[-(1:3)], collapse = " + ")))
sumresult <- glm.nb(formula, data = test)

You can also use the more readable code of @BrandonBertelsen

glm.nb(x ~ ., data = test[-c(1:3)])
like image 196
dickoa Avatar answered Oct 18 '22 19:10

dickoa