Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

revoScaleR::rxGlm() Question in R - GLM Residuals

Tags:

r

glm

revoscaler

I might not find an answer here because I don't think the revoScaleR package is widely used.

If I create a GLM using rxGlm() it works fine. However the model residuals available via rxPredict() seem to just be the "raw" residuals, ie observed value minus fitted value. The various transformed versions (deviance residuals, pearson residuals, etc.) don't seem to be available.

Does anyone know if there's a way to achieve this? I can get deviance residuals (for example) for the model by running it again using glm() (with the same formula, data, error structure, link function, weights) and using residuals(glm_object, type = "deviance"), but this is a nuisance because glm() runs very slowly (large dataset, many model parameters).

Thanks.

Edited: to include this guidance from the literature which I'm trying to follow:

enter image description here

like image 705
Alan Avatar asked Oct 15 '22 04:10

Alan


1 Answers

It is a bit hard to fully understand from your question what the RevoScaleR package provides in terms of residuals and which residuals exactly you need. In addition, there is quite some confusion regarding the terminology of residuals, as this exemplified for example here and here.

A few points/observations which might help you nonetheless.

In linear regression, "raw" are identical to "deviance" residuals

At least that what I take from running toy regressions with glm and predicting outcomes like:

df <- mtcars
modl <- glm(formula = mpg ~ wt + qsec + am, data = mtcars)
y_hat <- predict(modl)

Next, calculate the "raw" residuals (predicted outcome minus actual outcome) as well as deviance residuals:

y <- as.vector(df[["mpg"]])
res_raw <- y - y_hat
res_dev <- residuals(modl, type = "deviance")

These two are identical:

identical(res_raw, res_dev)
[1] TRUE

I guess it's more complicated once you get into binary outcomes etc.

Formula for computing standardized deviance residuals

Standardized deviance residuals are computed from glm with the rstandard method.

res_std <- rstandard(modl)

Looking at getAnywhere(rstandard.glm) tells you how the standardized residuals can be computed by hand from the deviance residuals:

function (model, infl = influence(model, do.coef = FALSE), type = c("deviance", 
    "pearson"), ...) 
{
    type <- match.arg(type)
    res <- switch(type, pearson = infl$pear.res, infl$dev.res)
    res <- res/sqrt(summary(model)$dispersion * (1 - infl$hat)) # this is the key line
    res[is.infinite(res)] <- NaN
    res
}

So in my example, you would manually compute standardized residuals by running res/sqrt(summary(modl)$dispersion * (1 - influence(modl)$hat)). So you need two things: hat and dispersion. I assume that RevoScaleR provides the dispersion parameter. If there is nothing in RevoScaleR like influence(modl)$hat to get the hat values, you'll have to do it from scratch:

X <- as.matrix(df[, c("wt", "qsec", "am")]) # Gets the X variables
X <- cbind(rep(1, nrow(df)), X) # adds column for the constant
hat <- diag(X %*% solve(t(X) %*% X) %*% t(X)) # formula for hat values

Now compute your standardized deviance residuals:

res_man <- res_raw/sqrt(summary(modl)$dispersion * (1 - hat))

Which are the same as derived with rstandard:

head(res_man)
        Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive Hornet Sportabout           Valiant 
       -0.6254171        -0.4941877        -1.4885771         0.2297471         0.7217423        -1.1790097 
head(res_std)
        Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive Hornet Sportabout           Valiant 
       -0.6254171        -0.4941877        -1.4885771         0.2297471         0.7217423        -1.1790097 
like image 66
broti Avatar answered Oct 19 '22 02:10

broti