Please see the fake data set.
library(data.table)
library(MASS)
n=5000
DT = data.table(
grp=1:n,
name=as.character(as.hexmode(1:n)),
x= sample(c(1:400),n,replace = TRUE)
)
setkey(DT,grp)
UIDlist <- unique(DT[,grp])
IDnamelist <- paste0("V", 1 : length(UIDlist), sep = "")
test <- DT[, (IDnamelist):=lapply(UIDlist,function(x) grp ==x)][, V5000:= NULL]
I have a data.table, in which there're 4 columns, "grp", "Name", "x", "y". And then I add dummy on each level in "grp". Then I need to run the regression using glm.nb in MASS package.
First I tried this
SumResult <- glm.nb(x ~ factor(uid), data = test)
But when adding dummies, we must notice that when there're N levels in "grp", we add N-1 dummies. So this method is not appropriate as far as I think.
So I tried this:
SumResult <- glm.nb( x ~ V1 + V2 + V3 + V4 + .....+ V4999 , data = test)
It's stupid to write all of the V1, V2, ... V4999 to do the regression.
Is there code can achieve the purpose?
Thanks
Still, in presenting the results for any multiple regression equation, it should always be clear from the table: (1) what the dependent variable is; (2) what the independent variables are; (3) the values of the partial slope coefficients (either unstandardized, standardized, or both); and (4) the details of any test of ...
To run the regression, arrange your data in columns as seen below. Click on the “Data” menu, and then choose the “Data Analysis” tab. You will now see a window listing the various statistical tests that Excel can perform. Scroll down to find the regression option and click “OK”.
You can try to create your formula object by string manipulation
formula <- as.formula(paste0("x ~ ", paste(names(test)[-(1:3)], collapse = " + ")))
sumresult <- glm.nb(formula, data = test)
You can also use the more readable code of @BrandonBertelsen
glm.nb(x ~ ., data = test[-c(1:3)])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With