I want to extract the standard errors from a list of logistic regression models.
glmfunk <- function(x) glm( ldata$DFREE ~ x , family=binomial)
glmkort <- lapply(ldata[,c(2,3,5,6,7,8)],glmfunk)
sapply(glmkørt, "[[", "coefficients")
But how do i extract the standard error of the coefficients? I can't seem to find it in the str(glmkort)?
Here is the str(glmkort) for AGE where i am looking for the standard error:
str(glmkort)
List of 6
$ AGE :List of 30
..$ coefficients : Named num [1:2] -1.17201 -0.00199
.. ..- attr(*, "names")= chr [1:2] "(Intercept)" "x"
..$ residuals : Named num [1:40] -1.29 -1.29 -1.29 -1.29 4.39 ...
.. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
..$ fitted.values : Named num [1:40] 0.223 0.225 0.225 0.225 0.228 ...
.. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
..$ effects : Named num [1:40] 3.2662 -0.0282 -0.4595 -0.4464 2.042 ...
.. ..- attr(*, "names")= chr [1:40] "(Intercept)" "x" "" "" ...
..$ R : num [1:2, 1:2] -2.64 0 -86.01 14.18
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "(Intercept)" "x"
.. .. ..$ : chr [1:2] "(Intercept)" "x"
..$ rank : int 2
..$ qr :List of 5
.. ..$ qr : num [1:40, 1:2] -2.641 0.158 0.158 0.158 0.159 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:40] "1" "2" "3" "4" ...
.. .. .. ..$ : chr [1:2] "(Intercept)" "x"
.. ..$ rank : int 2
.. ..$ qraux: num [1:2] 1.16 1.01
.. ..$ pivot: int [1:2] 1 2
.. ..$ tol : num 1e-11
.. ..- attr(*, "class")= chr "qr"
..$ family :List of 12
.. ..$ family : chr "binomial"
.. ..$ link : chr "logit"
.. ..$ linkfun :function (mu)
.. ..$ linkinv :function (eta)
.. ..$ variance :function (mu)
.. ..$ dev.resids:function (y, mu, wt)
.. ..$ aic :function (y, n, mu, wt, dev)
.. ..$ mu.eta :function (eta)
.. ..$ initialize: expression({ if (NCOL(y) == 1) { if (is.factor(y)) y <- y != levels(y)[1L] n <- rep.int(1, nobs) y[weights == 0] <- 0 if (any(y < 0 | y > 1)) stop("y values must be 0 <= y <= 1") mustart <- (weights * y + 0.5)/(weights + 1) m <- weights * y if (any(abs(m - round(m)) > 0.001)) warning("non-integer #successes in a binomial glm!") } else if (NCOL(y) == 2) { if (any(abs(y - round(y)) > 0.001)) warning("non-integer counts in a binomial glm!") n <- y[, 1] + y[, 2] y <- ifelse(n == 0, 0, y[, 1]/n) weights <- weights * n mustart <- (n * y + 0.5)/(n + 1) } else stop("for the binomial family, y must be a vector of 0 and 1's\n", "or a 2 column matrix where col 1 is no. successes and col 2 is no. failures") })
.. ..$ validmu :function (mu)
.. ..$ valideta :function (eta)
.. ..$ simulate :function (object, nsim)
.. ..- attr(*, "class")= chr "family"
..$ linear.predictors: Named num [1:40] -1.25 -1.24 -1.24 -1.24 -1.22 ...
.. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
..$ deviance : num 42.7
..$ aic : num 46.7
..$ null.deviance : num 42.7
..$ iter : int 4
..$ weights : Named num [1:40] 0.173 0.174 0.174 0.174 0.176 ...
.. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
..$ prior.weights : Named num [1:40] 1 1 1 1 1 1 1 1 1 1 ...
.. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
..$ df.residual : int 38
..$ df.null : int 39
..$ y : Named num [1:40] 0 0 0 0 1 0 1 0 0 0 ...
.. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
..$ converged : logi TRUE
..$ boundary : logi FALSE
..$ model :'data.frame': 40 obs. of 2 variables:
.. ..$ ldata$DFREE: int [1:40] 0 0 0 0 1 0 1 0 0 0 ...
.. ..$ x : int [1:40] 39 33 33 32 24 30 39 27 40 36 ...
.. ..- attr(*, "terms")=Classes 'terms', 'formula' length 3 ldata$DFREE ~ x
.. .. .. ..- attr(*, "variables")= language list(ldata$DFREE, x)
.. .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. ..$ : chr [1:2] "ldata$DFREE" "x"
.. .. .. .. .. ..$ : chr "x"
.. .. .. ..- attr(*, "term.labels")= chr "x"
.. .. .. ..- attr(*, "order")= int 1
.. .. .. ..- attr(*, "intercept")= int 1
.. .. .. ..- attr(*, "response")= int 1
.. .. .. ..- attr(*, ".Environment")=<environment: 0x017a5674>
.. .. .. ..- attr(*, "predvars")= language list(ldata$DFREE, x)
.. .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. .. .. ..- attr(*, "names")= chr [1:2] "ldata$DFREE" "x"
..$ call : language glm(formula = ldata$DFREE ~ x, family = binomial)
..$ formula :Class 'formula' length 3 ldata$DFREE ~ x
.. .. ..- attr(*, ".Environment")=<environment: 0x017a5674>
..$ terms :Classes 'terms', 'formula' length 3 ldata$DFREE ~ x
.. .. ..- attr(*, "variables")= language list(ldata$DFREE, x)
.. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:2] "ldata$DFREE" "x"
.. .. .. .. ..$ : chr "x"
.. .. ..- attr(*, "term.labels")= chr "x"
.. .. ..- attr(*, "order")= int 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: 0x017a5674>
.. .. ..- attr(*, "predvars")= language list(ldata$DFREE, x)
.. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. .. ..- attr(*, "names")= chr [1:2] "ldata$DFREE" "x"
..$ data :<environment: 0x017a5674>
..$ offset : NULL
..$ control :List of 3
.. ..$ epsilon: num 1e-08
.. ..$ maxit : num 25
.. ..$ trace : logi FALSE
..$ method : chr "glm.fit"
..$ contrasts : NULL
..$ xlevels : Named list()
..- attr(*, "class")= chr [1:2] "glm" "lm"
$ BECK :List of 30
> ldata
ID AGE BECK IVHX NDRUGTX RACE TREAT SITE DFREE
1 1 39 9.000 3 1 0 1 0 0
2 2 33 34.000 2 8 0 1 0 0
3 3 33 10.000 3 3 0 1 0 0
4 4 32 20.000 3 1 0 0 0 0
5 5 24 5.000 1 5 1 1 0 1
6 6 30 32.550 3 1 0 1 0 0
7 7 39 19.000 3 34 0 1 0 1
8 8 27 10.000 3 2 0 1 0 0
9 9 40 29.000 3 3 0 1 0 0
10 10 36 25.000 3 7 0 1 0 0
11 11 38 18.900 3 8 0 1 0 0
12 12 29 16.000 1 1 0 1 0 0
13 13 32 36.000 3 2 1 1 0 1
14 14 41 19.000 3 8 0 1 0 0
15 15 31 18.000 3 1 0 1 0 0
16 16 27 12.000 3 3 0 1 0 0
17 17 28 34.000 3 6 0 1 0 0
18 18 28 23.000 2 1 0 1 0 0
19 19 36 26.000 1 15 1 1 0 1
20 20 32 18.900 3 5 0 1 0 1
21 21 33 15.000 1 1 0 0 0 1
22 22 28 25.200 3 8 0 0 0 0
23 23 29 6.632 2 0 0 0 0 0
24 24 35 2.100 3 9 0 0 0 0
25 25 45 26.000 3 6 0 0 0 0
26 26 35 39.789 3 5 0 0 0 0
27 27 24 20.000 1 3 0 0 0 0
28 28 36 16.000 3 7 0 0 0 0
29 29 39 22.000 3 9 0 0 0 1
30 30 36 9.947 2 10 0 0 0 0
31 31 37 9.450 3 1 0 0 0 0
32 32 30 39.000 3 1 0 0 0 0
33 33 44 41.000 3 5 0 0 0 0
34 34 28 31.000 1 6 1 0 0 1
35 35 25 20.000 1 3 1 0 0 0
36 36 30 8.000 3 7 0 1 0 0
37 37 24 9.000 1 1 0 0 0 0
38 38 27 20.000 1 1 0 0 0 0
39 39 30 8.000 1 2 1 0 0 1
40 40 34 8.000 3 0 0 1 0 0
Using an example from ?glm
## Dobson (1990) Page 93: Randomized Controlled Trial :
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
glm.D93 <- glm(counts ~ outcome + treatment, family=poisson())
## copy twice to a list to illustrate
lmod <- list(mod1 = glm.D93, mod2 = glm.D93)
Then we could compute them as summary()
would, or extract them after calling summary()
. The former is far more efficient as you only compute what you want. The latter doesn't rely on knowing how the standard errors are derived.
The standard errors can be computed from the variance-covariance matrix of the model. The diagonal of this matrix contains the variances of the coefficients, and the standard errors are simply the square root of these variances. The vcov()
extractor function gets the variance-covariance matrix for us and we square root the diagonals with sqrt(diag())
:
> lapply(lmod, function(x) sqrt(diag(vcov(x))))
$mod1
(Intercept) outcome2 outcome3 treatment2 treatment3
0.1708987 0.2021708 0.1927423 0.2000000 0.2000000
$mod2
(Intercept) outcome2 outcome3 treatment2 treatment3
0.1708987 0.2021708 0.1927423 0.2000000 0.2000000
summary()
Or we can let summary()
compute the standard errors (and a lot more), then use lapply()
or sapply()
to apply an anonymous function that extracts coef(summary(x))
and takes the second column (in which the standard errors are stored).
lapply(lmod, function(x) coef(summary(x))[,2])
Which gives
> lapply(lmod, function(x) coef(summary(x))[,2])
$mod1
(Intercept) outcome2 outcome3 treatment2 treatment3
0.1708987 0.2021708 0.1927423 0.2000000 0.2000000
$mod2
(Intercept) outcome2 outcome3 treatment2 treatment3
0.1708987 0.2021708 0.1927423 0.2000000 0.2000000
whereas sapply()
would give:
> sapply(lmod, function(x) coef(summary(x))[,2])
mod1 mod2
(Intercept) 0.1708987 0.1708987
outcome2 0.2021708 0.2021708
outcome3 0.1927423 0.1927423
treatment2 0.2000000 0.2000000
treatment3 0.2000000 0.2000000
Depending on what you wanted to do , you could extract both the coefficients and the standard errors with a single call:
> lapply(lmod, function(x) coef(summary(x))[,1:2])
$mod1
Estimate Std. Error
(Intercept) 3.044522e+00 0.1708987
outcome2 -4.542553e-01 0.2021708
outcome3 -2.929871e-01 0.1927423
treatment2 1.337909e-15 0.2000000
treatment3 1.421085e-15 0.2000000
$mod2
Estimate Std. Error
(Intercept) 3.044522e+00 0.1708987
outcome2 -4.542553e-01 0.2021708
outcome3 -2.929871e-01 0.1927423
treatment2 1.337909e-15 0.2000000
treatment3 1.421085e-15 0.2000000
But you might prefer them separately?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With