Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract all standard errors of coefficients from list of logistic regressions

I want to extract the standard errors from a list of logistic regression models.

This is the logistic regression function, designed this way so i can run more than one analysis at once:

glmfunk <- function(x) glm(  ldata$DFREE ~  x , family=binomial)

I run it on a subset of the variables in the dataframe ldata:

glmkort <- lapply(ldata[,c(2,3,5,6,7,8)],glmfunk)

I can extract the coefficients like this:

sapply(glmkørt, "[[", "coefficients")

But how do i extract the standard error of the coefficients? I can't seem to find it in the str(glmkort)?

Here is the str(glmkort) for AGE where i am looking for the standard error:

str(glmkort)
List of 6
 $ AGE    :List of 30
  ..$ coefficients     : Named num [1:2] -1.17201 -0.00199
  .. ..- attr(*, "names")= chr [1:2] "(Intercept)" "x"
  ..$ residuals        : Named num [1:40] -1.29 -1.29 -1.29 -1.29 4.39 ...
  .. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
  ..$ fitted.values    : Named num [1:40] 0.223 0.225 0.225 0.225 0.228 ...
  .. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
  ..$ effects          : Named num [1:40] 3.2662 -0.0282 -0.4595 -0.4464 2.042 ...
  .. ..- attr(*, "names")= chr [1:40] "(Intercept)" "x" "" "" ...
  ..$ R                : num [1:2, 1:2] -2.64 0 -86.01 14.18
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2] "(Intercept)" "x"
  .. .. ..$ : chr [1:2] "(Intercept)" "x"
  ..$ rank             : int 2
  ..$ qr               :List of 5
  .. ..$ qr   : num [1:40, 1:2] -2.641 0.158 0.158 0.158 0.159 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:40] "1" "2" "3" "4" ...
  .. .. .. ..$ : chr [1:2] "(Intercept)" "x"
  .. ..$ rank : int 2
  .. ..$ qraux: num [1:2] 1.16 1.01
  .. ..$ pivot: int [1:2] 1 2
  .. ..$ tol  : num 1e-11
  .. ..- attr(*, "class")= chr "qr"
  ..$ family           :List of 12
  .. ..$ family    : chr "binomial"
  .. ..$ link      : chr "logit"
  .. ..$ linkfun   :function (mu)  
  .. ..$ linkinv   :function (eta)  
  .. ..$ variance  :function (mu)  
  .. ..$ dev.resids:function (y, mu, wt)  
  .. ..$ aic       :function (y, n, mu, wt, dev)  
  .. ..$ mu.eta    :function (eta)  
  .. ..$ initialize:  expression({     if (NCOL(y) == 1) {         if (is.factor(y))              y <- y != levels(y)[1L]         n <- rep.int(1, nobs)         y[weights == 0] <- 0         if (any(y < 0 | y > 1))              stop("y values must be 0 <= y <= 1")         mustart <- (weights * y + 0.5)/(weights + 1)         m <- weights * y         if (any(abs(m - round(m)) > 0.001))              warning("non-integer #successes in a binomial glm!")     }     else if (NCOL(y) == 2) {         if (any(abs(y - round(y)) > 0.001))              warning("non-integer counts in a binomial glm!")         n <- y[, 1] + y[, 2]         y <- ifelse(n == 0, 0, y[, 1]/n)         weights <- weights * n         mustart <- (n * y + 0.5)/(n + 1)     }     else stop("for the binomial family, y must be a vector of 0 and 1's\n", "or a 2 column matrix where col 1 is no. successes and col 2 is no. failures") })
  .. ..$ validmu   :function (mu)  
  .. ..$ valideta  :function (eta)  
  .. ..$ simulate  :function (object, nsim)  
  .. ..- attr(*, "class")= chr "family"
  ..$ linear.predictors: Named num [1:40] -1.25 -1.24 -1.24 -1.24 -1.22 ...
  .. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
  ..$ deviance         : num 42.7
  ..$ aic              : num 46.7
  ..$ null.deviance    : num 42.7
  ..$ iter             : int 4
  ..$ weights          : Named num [1:40] 0.173 0.174 0.174 0.174 0.176 ...
  .. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
  ..$ prior.weights    : Named num [1:40] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
  ..$ df.residual      : int 38
  ..$ df.null          : int 39
  ..$ y                : Named num [1:40] 0 0 0 0 1 0 1 0 0 0 ...
  .. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
  ..$ converged        : logi TRUE
  ..$ boundary         : logi FALSE
  ..$ model            :'data.frame':   40 obs. of  2 variables:
  .. ..$ ldata$DFREE: int [1:40] 0 0 0 0 1 0 1 0 0 0 ...
  .. ..$ x          : int [1:40] 39 33 33 32 24 30 39 27 40 36 ...
  .. ..- attr(*, "terms")=Classes 'terms', 'formula' length 3 ldata$DFREE ~ x
  .. .. .. ..- attr(*, "variables")= language list(ldata$DFREE, x)
  .. .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr [1:2] "ldata$DFREE" "x"
  .. .. .. .. .. ..$ : chr "x"
  .. .. .. ..- attr(*, "term.labels")= chr "x"
  .. .. .. ..- attr(*, "order")= int 1
  .. .. .. ..- attr(*, "intercept")= int 1
  .. .. .. ..- attr(*, "response")= int 1
  .. .. .. ..- attr(*, ".Environment")=<environment: 0x017a5674> 
  .. .. .. ..- attr(*, "predvars")= language list(ldata$DFREE, x)
  .. .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. .. .. ..- attr(*, "names")= chr [1:2] "ldata$DFREE" "x"
  ..$ call             : language glm(formula = ldata$DFREE ~ x, family = binomial)
  ..$ formula          :Class 'formula' length 3 ldata$DFREE ~ x
  .. .. ..- attr(*, ".Environment")=<environment: 0x017a5674> 
  ..$ terms            :Classes 'terms', 'formula' length 3 ldata$DFREE ~ x
  .. .. ..- attr(*, "variables")= language list(ldata$DFREE, x)
  .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:2] "ldata$DFREE" "x"
  .. .. .. .. ..$ : chr "x"
  .. .. ..- attr(*, "term.labels")= chr "x"
  .. .. ..- attr(*, "order")= int 1
  .. .. ..- attr(*, "intercept")= int 1
  .. .. ..- attr(*, "response")= int 1
  .. .. ..- attr(*, ".Environment")=<environment: 0x017a5674> 
  .. .. ..- attr(*, "predvars")= language list(ldata$DFREE, x)
  .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. .. ..- attr(*, "names")= chr [1:2] "ldata$DFREE" "x"
  ..$ data             :<environment: 0x017a5674> 
  ..$ offset           : NULL
  ..$ control          :List of 3
  .. ..$ epsilon: num 1e-08
  .. ..$ maxit  : num 25
  .. ..$ trace  : logi FALSE
  ..$ method           : chr "glm.fit"
  ..$ contrasts        : NULL
  ..$ xlevels          : Named list()
  ..- attr(*, "class")= chr [1:2] "glm" "lm"
 $ BECK   :List of 30

Here is the data i have used for the example. I shortened it for the purporse of this question:

> ldata
   ID AGE   BECK IVHX NDRUGTX RACE TREAT SITE DFREE
1   1  39  9.000    3       1    0     1    0     0
2   2  33 34.000    2       8    0     1    0     0
3   3  33 10.000    3       3    0     1    0     0
4   4  32 20.000    3       1    0     0    0     0
5   5  24  5.000    1       5    1     1    0     1
6   6  30 32.550    3       1    0     1    0     0
7   7  39 19.000    3      34    0     1    0     1
8   8  27 10.000    3       2    0     1    0     0
9   9  40 29.000    3       3    0     1    0     0
10 10  36 25.000    3       7    0     1    0     0
11 11  38 18.900    3       8    0     1    0     0
12 12  29 16.000    1       1    0     1    0     0
13 13  32 36.000    3       2    1     1    0     1
14 14  41 19.000    3       8    0     1    0     0
15 15  31 18.000    3       1    0     1    0     0
16 16  27 12.000    3       3    0     1    0     0
17 17  28 34.000    3       6    0     1    0     0
18 18  28 23.000    2       1    0     1    0     0
19 19  36 26.000    1      15    1     1    0     1
20 20  32 18.900    3       5    0     1    0     1
21 21  33 15.000    1       1    0     0    0     1
22 22  28 25.200    3       8    0     0    0     0
23 23  29  6.632    2       0    0     0    0     0
24 24  35  2.100    3       9    0     0    0     0
25 25  45 26.000    3       6    0     0    0     0
26 26  35 39.789    3       5    0     0    0     0
27 27  24 20.000    1       3    0     0    0     0
28 28  36 16.000    3       7    0     0    0     0
29 29  39 22.000    3       9    0     0    0     1
30 30  36  9.947    2      10    0     0    0     0
31 31  37  9.450    3       1    0     0    0     0
32 32  30 39.000    3       1    0     0    0     0
33 33  44 41.000    3       5    0     0    0     0
34 34  28 31.000    1       6    1     0    0     1
35 35  25 20.000    1       3    1     0    0     0
36 36  30  8.000    3       7    0     1    0     0
37 37  24  9.000    1       1    0     0    0     0
38 38  27 20.000    1       1    0     0    0     0
39 39  30  8.000    1       2    1     0    0     1
40 40  34  8.000    3       0    0     1    0     0
like image 891
Rene Bern Avatar asked Feb 18 '23 17:02

Rene Bern


1 Answers

Using an example from ?glm

## Dobson (1990) Page 93: Randomized Controlled Trial :
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
glm.D93 <- glm(counts ~ outcome + treatment, family=poisson())

## copy twice to a list to illustrate
lmod <- list(mod1 = glm.D93, mod2 = glm.D93)

Then we could compute them as summary() would, or extract them after calling summary(). The former is far more efficient as you only compute what you want. The latter doesn't rely on knowing how the standard errors are derived.

Compute the standard errors directly

The standard errors can be computed from the variance-covariance matrix of the model. The diagonal of this matrix contains the variances of the coefficients, and the standard errors are simply the square root of these variances. The vcov() extractor function gets the variance-covariance matrix for us and we square root the diagonals with sqrt(diag()):

> lapply(lmod, function(x) sqrt(diag(vcov(x))))
$mod1
(Intercept)    outcome2    outcome3  treatment2  treatment3 
  0.1708987   0.2021708   0.1927423   0.2000000   0.2000000 

$mod2
(Intercept)    outcome2    outcome3  treatment2  treatment3 
  0.1708987   0.2021708   0.1927423   0.2000000   0.2000000

Extract them from a call to summary()

Or we can let summary() compute the standard errors (and a lot more), then use lapply() or sapply() to apply an anonymous function that extracts coef(summary(x)) and takes the second column (in which the standard errors are stored).

lapply(lmod, function(x) coef(summary(x))[,2])

Which gives

> lapply(lmod, function(x) coef(summary(x))[,2])
$mod1
(Intercept)    outcome2    outcome3  treatment2  treatment3 
  0.1708987   0.2021708   0.1927423   0.2000000   0.2000000 

$mod2
(Intercept)    outcome2    outcome3  treatment2  treatment3 
  0.1708987   0.2021708   0.1927423   0.2000000   0.2000000

whereas sapply() would give:

> sapply(lmod, function(x) coef(summary(x))[,2])
                 mod1      mod2
(Intercept) 0.1708987 0.1708987
outcome2    0.2021708 0.2021708
outcome3    0.1927423 0.1927423
treatment2  0.2000000 0.2000000
treatment3  0.2000000 0.2000000

Depending on what you wanted to do , you could extract both the coefficients and the standard errors with a single call:

> lapply(lmod, function(x) coef(summary(x))[,1:2])
$mod1
                 Estimate Std. Error
(Intercept)  3.044522e+00  0.1708987
outcome2    -4.542553e-01  0.2021708
outcome3    -2.929871e-01  0.1927423
treatment2   1.337909e-15  0.2000000
treatment3   1.421085e-15  0.2000000

$mod2
                 Estimate Std. Error
(Intercept)  3.044522e+00  0.1708987
outcome2    -4.542553e-01  0.2021708
outcome3    -2.929871e-01  0.1927423
treatment2   1.337909e-15  0.2000000
treatment3   1.421085e-15  0.2000000

But you might prefer them separately?

like image 113
Gavin Simpson Avatar answered Feb 21 '23 15:02

Gavin Simpson