How can I compare regression coefficients across three (or more) groups using R?

Question

Sometimes your research may predict that the size of a regression coefficient may vary across groups. For example, you might believe that the regression coefficient of height predicting weight would differ across three age groups (young, middle age, senior citizen). Below, we have a data file with 3 fictional young people, 3 fictional middle age people, and 3 fictional senior citizens, along with their height and their weight. The variable age indicates the age group and is coded 1 for young people, 2 for middle aged, and 3 for senior citizens.

So, how can I compare regression coefficients (slope mainly) across three (or more) groups using R?

Sample data:

age height weight 
 1  56 140   
 1  60 155   
 1  64 143     
 2  56 117   
 2  60 125   
 2  64 133      
 3  74 245   
 3  75 241   
 3  82 269

So, how can I compare regression coefficients (slope mainly) across three (or more) groups using R?

Sample data:

age height weight 
 1  56 140   
 1  60 155   
 1  64 143     
 2  56 117   
 2  60 125   
 2  64 133      
 3  74 245   
 3  75 241   
 3  82 269

CAOC · Accepted Answer

There is an elegant answer to this in CrossValidated. But briefly,

require(emmeans)
data <- data.frame(age = factor(c(1,1,1,2,2,2,3,3,3)), 
               height = c(56,60,64,56,60,64,74,75,82), 
               weight = c(140,155,142,117,125,133,245,241,269))

model <- lm(weight ~ height*age, data)
anova(model) #check the results

Analysis of Variance Table

Response: weight
           Df  Sum Sq Mean Sq  F value    Pr(>F)    
height      1 25392.3 25392.3 481.5984 0.0002071 ***
age         2  2707.4  1353.7  25.6743 0.0129688 *  
height:age  2   169.0    84.5   1.6027 0.3361518    
Residuals   3   158.2    52.7                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


slopes <- emtrends(model, 'age', var = 'height') #gets each slope
slopes
age height.trend   SE df lower.CL upper.CL
1           0.25 1.28  3    -3.84     4.34
2           2.00 1.28  3    -2.09     6.09
3           3.37 1.18  3    -0.38     7.12

Confidence level used: 0.95 


pairs(slopes) #gets their comparisons two by two
contrast estimate   SE df t.ratio p.value
1 - 2       -1.75 1.82  3 -0.964  0.6441 
1 - 3       -3.12 1.74  3 -1.790  0.3125 
2 - 3       -1.37 1.74  3 -0.785  0.7363 

P value adjustment: tukey method for comparing a family of 3 estimates

G. Grothendieck · Answer

To determine whether the regression coefficients "differ across three age groups" we can use anova function in R. For example, using the data in the question and shown reproducibly in the note at the end:

fm1 <- lm(weight ~ height, DF)
fm3 <- lm(weight ~ age/(height - 1), DF)

giving the following which is significant at the 2.7% level so we would conclude that there are differences in the regression coefficients of the groups if we were using a 5% cutoff but not if we were using a 1% cutoff. The age/(height - 1) in the formula for fm3 means that height is nested in age and the overall intercept is omitted. Thus the model estimates separate intercepts and slopes for each age group. This is equivalent to age + age:height - 1.

> anova(fm1, fm3)
 Analysis of Variance Table

Model 1: weight ~ height
Model 2: weight ~ age/(height - 1)
  Res.Df     RSS Df Sum of Sq      F  Pr(>F)  
1      7 2991.57                              
2      3  149.01  4    2842.6 14.307 0.02696 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Note 1: Above fm3 has 6 coefficients, an intercept and slope for each group. If you want 4 coefficients, a common intercept and separate slopes, then use

lm(weight ~ age:height, DF)

Note 2: We can also compare a model in which subsets of levels are the same. For example, we can compare a model in which ages 1 and 2 are the same to models in which they are all the same (fm1) and all different (fm3):

fm2 <- lm(weight ~ age/(height - 1), transform(DF, age = factor(c(1, 1, 3)[age])))
anova(fm1, fm2, fm3)

If you do a large number of tests you can get significance on some just by chance so you will want to lower the cutoff for p values.

Note 3: There are some notes on lm formulas here: https://sites.google.com/site/r4naturalresources/r-topics/fitting-models/formulas

Note 4: We used this as the input:

Lines <- "age height weight
1 56 140
1 60 155
1 64 143
2 56 117
2 60 125
2 64 133
3 74 245
3 75 241
3 82 269"
DF <- read.table(text = Lines, header = TRUE)
DF$age <- factor(DF$age)

How can I compare regression coefficients across three (or more) groups using R?

Tags:

r

regression

LuisaL

2 Answers

CAOC

G. Grothendieck

Recent Activity

Donate For Us

How can I compare regression coefficients across three (or more) groups using R?

Tags:

r

regression

LuisaL

2 Answers

CAOC

G. Grothendieck

Related questions

Recent Activity

Donate For Us