Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R linear model, get p-values for only the interaction coefficients

Tags:

r

lm

If I have a summary table for a linear model in R, how can I get the p-values associated with just the interaction estimates, or just the group intercepts, etc., without having to count the row numbers?

For example, with a model such as lm(y ~ x + group) with x as continuous and group as categorical, the summary table for the lm object has estimates for:

  1. an intercept
  2. x, the slope across all groups
  3. 5 within group differences from the overall intercept
  4. 5 within group differences from the overall slope.

I would like to figure out a way to get each of these as a group of p-values, even if the number of groups or the model formula change. Maybe there is information the summary table somehow uses to group together the rows?

Following is an example data set with two different models. The first model has four different sets of p-values I might want to get separately, while the second model has only two sets of p-values.

x <- 1:100
groupA <- .5*x + 10 + rnorm(length(x), 0, 1)
groupB <- .5*x + 20 + rnorm(length(x), 0, 1)
groupC <- .5*x + 30 + rnorm(length(x), 0, 1)
groupD <- .5*x + 40 + rnorm(length(x), 0, 1)
groupE <- .5*x + 50 + rnorm(length(x), 0, 1)
groupF <- .5*x + 60 + rnorm(length(x), 0, 1)

myData <- data.frame(x = x,
    y = c(groupA, groupB, groupC, groupD, groupE, groupF),
    group = rep(c("A","B","C","D","E","F"), each = length(x))
)

myMod1 <- lm(y ~ x + group + x:group, data = myData)
myMod2 <- lm(y ~ group + x:group - 1, data = myData)
summary(myMod1)
summary(myMod2)
like image 720
Jdub Avatar asked Jun 27 '12 15:06

Jdub


2 Answers

You can access all coefficients and their associated statistics via summary()$coefficients, like so:

> summary(myMod1)$coefficients
                 Estimate  Std. Error      t value      Pr(>|t|)
(Intercept)  9.8598180335 0.207551769  47.50534335 1.882690e-203
x            0.5013049448 0.003568152 140.49427911  0.000000e+00
groupB       9.9833257879 0.293522526  34.01212819 5.343527e-141
groupC      20.0988336744 0.293522526  68.47458673 2.308586e-282
groupD      30.0671851583 0.293522526 102.43569906  0.000000e+00
groupE      39.8366758058 0.293522526 135.71931370  0.000000e+00
groupF      50.4780382104 0.293522526 171.97330259  0.000000e+00
x:groupB    -0.0001115097 0.005046129  -0.02209807  9.823772e-01
x:groupC     0.0004144536 0.005046129   0.08213297  9.345689e-01
x:groupD     0.0022577223 0.005046129   0.44741668  6.547390e-01
x:groupE     0.0024544207 0.005046129   0.48639675  6.268671e-01
x:groupF    -0.0052089956 0.005046129  -1.03227556  3.023674e-01

Of this you only want the p-values, i.e. the 4th column:

> summary(myMod1)$coefficients[,4]
  (Intercept)             x        groupB        groupC        groupD        groupE        groupF      x:groupB      x:groupC 
1.882690e-203  0.000000e+00 5.343527e-141 2.308586e-282  0.000000e+00  0.000000e+00  0.000000e+00  9.823772e-01  9.345689e-01 
     x:groupD      x:groupE      x:groupF 
 6.547390e-01  6.268671e-01  3.023674e-01 

Lastly, you only want the p-values of particular coefficients, either for the intercepts or for the interaction terms. One way of doing this is to match the coefficient names (names(summary(myMod1)$coefficients[,4])) to a RegEx via grepl() and using the logical vector that grepl returns as an index:

> # all group dummies
> summary(myMod1)$coefficients[grepl('^group[A-F]',names(summary(myMod1)$coefficients[,4])),4]
       groupB        groupC        groupD        groupE        groupF 
5.343527e-141 2.308586e-282  0.000000e+00  0.000000e+00  0.000000e+00 
> # all interaction terms
> summary(myMod1)$coefficients[grepl('^x:group[A-F]',names(summary(myMod1)$coefficients[,4])),4]
 x:groupB  x:groupC  x:groupD  x:groupE  x:groupF 
0.9823772 0.9345689 0.6547390 0.6268671 0.3023674 
like image 52
RoyalTS Avatar answered Nov 15 '22 19:11

RoyalTS


There's now the broom package to handle output of statistical functions. In this case, with its tidy() function:

library(broom)
tidy(myMod1)

          term      estimate  std.error   statistic       p.value
1  (Intercept) 10.0379389850 0.19497112  51.4842342 5.143448e-220
2            x  0.5009946732 0.00335187 149.4672019  0.000000e+00
3       groupB  9.8949134549 0.27573081  35.8861368 3.002513e-150
4       groupC 19.8437942091 0.27573081  71.9679981 1.021613e-293
5       groupD 29.9055587100 0.27573081 108.4592579  0.000000e+00
6       groupE 39.7258414666 0.27573081 144.0747296  0.000000e+00
7       groupF 50.1210013973 0.27573081 181.7751231  0.000000e+00
8     x:groupB -0.0005319302 0.00474026  -0.1122154  9.106909e-01
9     x:groupC -0.0010145553 0.00474026  -0.2140294  8.305983e-01
10    x:groupD -0.0025544113 0.00474026  -0.5388757  5.901766e-01
11    x:groupE  0.0045780202 0.00474026   0.9657740  3.345543e-01
12    x:groupF -0.0058636354 0.00474026  -1.2369859  2.165861e-01

The result is a data.frame, so you can easily filter for interaction terms (which have a colon in their names):

pvals <- tidy(myMod1)[, c(1,5)]
pvals[grepl(":", pvals$term), ]

       term   p.value
8  x:groupB 0.9106909
9  x:groupC 0.8305983
10 x:groupD 0.5901766
11 x:groupE 0.3345543
12 x:groupF 0.2165861

broom plays well with the dplyr package; for instance, to extract the non-interaction group coefficients:

library(dplyr)
tidy(myMod1) %>%
  select(term, p.value) %>%
  filter(! grepl(":", term), term != "(Intercept)", term != "x")

    term       p.value
1 groupB 3.002513e-150
2 groupC 1.021613e-293
3 groupD  0.000000e+00
4 groupE  0.000000e+00
5 groupF  0.000000e+00
like image 34
Sam Firke Avatar answered Nov 15 '22 18:11

Sam Firke