If I have a summary table for a linear model in R, how can I get the p-values associated with just the interaction estimates, or just the group intercepts, etc., without having to count the row numbers?
For example, with a model such as lm(y ~ x + group)
with x
as continuous and group
as categorical, the summary table for the lm
object has estimates for:
I would like to figure out a way to get each of these as a group of p-values, even if the number of groups or the model formula change. Maybe there is information the summary table somehow uses to group together the rows?
Following is an example data set with two different models. The first model has four different sets of p-values I might want to get separately, while the second model has only two sets of p-values.
x <- 1:100
groupA <- .5*x + 10 + rnorm(length(x), 0, 1)
groupB <- .5*x + 20 + rnorm(length(x), 0, 1)
groupC <- .5*x + 30 + rnorm(length(x), 0, 1)
groupD <- .5*x + 40 + rnorm(length(x), 0, 1)
groupE <- .5*x + 50 + rnorm(length(x), 0, 1)
groupF <- .5*x + 60 + rnorm(length(x), 0, 1)
myData <- data.frame(x = x,
y = c(groupA, groupB, groupC, groupD, groupE, groupF),
group = rep(c("A","B","C","D","E","F"), each = length(x))
)
myMod1 <- lm(y ~ x + group + x:group, data = myData)
myMod2 <- lm(y ~ group + x:group - 1, data = myData)
summary(myMod1)
summary(myMod2)
You can access all coefficients and their associated statistics via summary()$coefficients
, like so:
> summary(myMod1)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.8598180335 0.207551769 47.50534335 1.882690e-203
x 0.5013049448 0.003568152 140.49427911 0.000000e+00
groupB 9.9833257879 0.293522526 34.01212819 5.343527e-141
groupC 20.0988336744 0.293522526 68.47458673 2.308586e-282
groupD 30.0671851583 0.293522526 102.43569906 0.000000e+00
groupE 39.8366758058 0.293522526 135.71931370 0.000000e+00
groupF 50.4780382104 0.293522526 171.97330259 0.000000e+00
x:groupB -0.0001115097 0.005046129 -0.02209807 9.823772e-01
x:groupC 0.0004144536 0.005046129 0.08213297 9.345689e-01
x:groupD 0.0022577223 0.005046129 0.44741668 6.547390e-01
x:groupE 0.0024544207 0.005046129 0.48639675 6.268671e-01
x:groupF -0.0052089956 0.005046129 -1.03227556 3.023674e-01
Of this you only want the p-values, i.e. the 4th column:
> summary(myMod1)$coefficients[,4]
(Intercept) x groupB groupC groupD groupE groupF x:groupB x:groupC
1.882690e-203 0.000000e+00 5.343527e-141 2.308586e-282 0.000000e+00 0.000000e+00 0.000000e+00 9.823772e-01 9.345689e-01
x:groupD x:groupE x:groupF
6.547390e-01 6.268671e-01 3.023674e-01
Lastly, you only want the p-values of particular coefficients, either for the intercepts or for the interaction terms. One way of doing this is to match the coefficient names (names(summary(myMod1)$coefficients[,4])
) to a RegEx via grepl()
and using the logical vector that grepl
returns as an index:
> # all group dummies
> summary(myMod1)$coefficients[grepl('^group[A-F]',names(summary(myMod1)$coefficients[,4])),4]
groupB groupC groupD groupE groupF
5.343527e-141 2.308586e-282 0.000000e+00 0.000000e+00 0.000000e+00
> # all interaction terms
> summary(myMod1)$coefficients[grepl('^x:group[A-F]',names(summary(myMod1)$coefficients[,4])),4]
x:groupB x:groupC x:groupD x:groupE x:groupF
0.9823772 0.9345689 0.6547390 0.6268671 0.3023674
There's now the broom package to handle output of statistical functions. In this case, with its tidy()
function:
library(broom)
tidy(myMod1)
term estimate std.error statistic p.value
1 (Intercept) 10.0379389850 0.19497112 51.4842342 5.143448e-220
2 x 0.5009946732 0.00335187 149.4672019 0.000000e+00
3 groupB 9.8949134549 0.27573081 35.8861368 3.002513e-150
4 groupC 19.8437942091 0.27573081 71.9679981 1.021613e-293
5 groupD 29.9055587100 0.27573081 108.4592579 0.000000e+00
6 groupE 39.7258414666 0.27573081 144.0747296 0.000000e+00
7 groupF 50.1210013973 0.27573081 181.7751231 0.000000e+00
8 x:groupB -0.0005319302 0.00474026 -0.1122154 9.106909e-01
9 x:groupC -0.0010145553 0.00474026 -0.2140294 8.305983e-01
10 x:groupD -0.0025544113 0.00474026 -0.5388757 5.901766e-01
11 x:groupE 0.0045780202 0.00474026 0.9657740 3.345543e-01
12 x:groupF -0.0058636354 0.00474026 -1.2369859 2.165861e-01
The result is a data.frame, so you can easily filter for interaction terms (which have a colon in their names):
pvals <- tidy(myMod1)[, c(1,5)]
pvals[grepl(":", pvals$term), ]
term p.value
8 x:groupB 0.9106909
9 x:groupC 0.8305983
10 x:groupD 0.5901766
11 x:groupE 0.3345543
12 x:groupF 0.2165861
broom
plays well with the dplyr package; for instance, to extract the non-interaction group coefficients:
library(dplyr)
tidy(myMod1) %>%
select(term, p.value) %>%
filter(! grepl(":", term), term != "(Intercept)", term != "x")
term p.value
1 groupB 3.002513e-150
2 groupC 1.021613e-293
3 groupD 0.000000e+00
4 groupE 0.000000e+00
5 groupF 0.000000e+00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With