Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bayes predict, subscript out of bounds

I'm having some problems with the predict function when using bayesglm. I've read some posts that say this problem may arise when the out of sample data has more levels than the in sample data, but I'm using the same data for the fit and predict functions. Predict works fine with regular glm, but not with bayesglm. Example:

control <- y ~ x1 + x2

# this works fine:
glmObject <- glm(control, myData, family = binomial())
predicted1 <- predict.glm(glmObject , myData, type = "response")

# this gives an error: 
bayesglmObject <- bayesglm(control, myData, family = binomial())
predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response") 
Error in X[, piv, drop = FALSE] : subscript out of bounds

# Edit... I just discovered this works. 
# Should I be concerned about using these results?
# Not sure why is fails when I specify the dataset
predicted3 <- predict(bayesglmObject, type = "response")

Can't figure out how to predict with a bayesglm object. Any ideas? Thanks!

like image 455
ch-pub Avatar asked Oct 21 '22 06:10

ch-pub


1 Answers

One of the reasons could be to do with the default setting for the parameter "drop.unused.levels" in the bayesglm command. By default, this parameter is set to TRUE. So if there are unused levels, it gets dropped during model building. However, the predict function still uses the original data with the unused levels present in the factor variable. This causes differences in level between the data used for model building and the one used for prediction (even it is the same data fame -in your case, myData). I have given an example below:

    n <- 100
    x1 <- rnorm (n)
    x2 <- as.factor(sample(c(1,2,3),n,replace = TRUE))

    # Replacing 3 with 2 makes the level = 3 as unused
    x2[x2==3] <- 2

    y <- as.factor(sample(c(1,2),n,replace = TRUE))

    myData <- data.frame(x1 = x1, x2 = x2, y = y)
    control <- y ~ x1 + x2

    # this works fine:
    glmObject <- glm(control, myData, family = binomial())
    predicted1 <- predict.glm(glmObject , myData, type = "response")

    # this gives an error - this uses default drop.unused.levels = TRUE
    bayesglmObject <- bayesglm(control, myData, family = binomial())
    predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response") 

    Error in X[, piv, drop = FALSE] : subscript out of bounds

    # this works fine - value of drop.unused.levels is set to FALSE
    bayesglmObject <- bayesglm(control, myData, family = binomial(),drop.unused.levels   = FALSE)
    predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response") 

I think a better way would be to use droplevels to drop the unused levels from the data frame beforehand and use it for both model building and prediction.

like image 182
Ravi Avatar answered Nov 11 '22 17:11

Ravi