I might not find an answer here because I don't think the revoScaleR
package is widely used.
If I create a GLM using rxGlm()
it works fine. However the model residuals available via rxPredict()
seem to just be the "raw" residuals, ie observed value minus fitted value. The various transformed versions (deviance residuals, pearson residuals, etc.) don't seem to be available.
Does anyone know if there's a way to achieve this? I can get deviance residuals (for example) for the model by running it again using glm()
(with the same formula, data, error structure, link function, weights) and using residuals(glm_object, type = "deviance")
, but this is a nuisance because glm()
runs very slowly (large dataset, many model parameters).
Thanks.
Edited: to include this guidance from the literature which I'm trying to follow:
It is a bit hard to fully understand from your question what the RevoScaleR
package provides in terms of residuals and which residuals exactly you need. In addition, there is quite some confusion regarding the terminology of residuals, as this exemplified for example here and here.
A few points/observations which might help you nonetheless.
At least that what I take from running toy regressions with glm
and predicting outcomes like:
df <- mtcars
modl <- glm(formula = mpg ~ wt + qsec + am, data = mtcars)
y_hat <- predict(modl)
Next, calculate the "raw" residuals (predicted outcome minus actual outcome) as well as deviance residuals:
y <- as.vector(df[["mpg"]])
res_raw <- y - y_hat
res_dev <- residuals(modl, type = "deviance")
These two are identical:
identical(res_raw, res_dev)
[1] TRUE
I guess it's more complicated once you get into binary outcomes etc.
Standardized deviance residuals are computed from glm
with the rstandard
method.
res_std <- rstandard(modl)
Looking at getAnywhere(rstandard.glm)
tells you how the standardized residuals can be computed by hand from the deviance residuals:
function (model, infl = influence(model, do.coef = FALSE), type = c("deviance",
"pearson"), ...)
{
type <- match.arg(type)
res <- switch(type, pearson = infl$pear.res, infl$dev.res)
res <- res/sqrt(summary(model)$dispersion * (1 - infl$hat)) # this is the key line
res[is.infinite(res)] <- NaN
res
}
So in my example, you would manually compute standardized residuals by running res/sqrt(summary(modl)$dispersion * (1 - influence(modl)$hat))
. So you need two things: hat
and dispersion
. I assume that RevoScaleR
provides the dispersion parameter. If there is nothing in RevoScaleR
like influence(modl)$hat
to get the hat values, you'll have to do it from scratch:
X <- as.matrix(df[, c("wt", "qsec", "am")]) # Gets the X variables
X <- cbind(rep(1, nrow(df)), X) # adds column for the constant
hat <- diag(X %*% solve(t(X) %*% X) %*% t(X)) # formula for hat values
Now compute your standardized deviance residuals:
res_man <- res_raw/sqrt(summary(modl)$dispersion * (1 - hat))
Which are the same as derived with rstandard
:
head(res_man)
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout Valiant
-0.6254171 -0.4941877 -1.4885771 0.2297471 0.7217423 -1.1790097
head(res_std)
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout Valiant
-0.6254171 -0.4941877 -1.4885771 0.2297471 0.7217423 -1.1790097
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With