Changing the labels of your random-effects grouping variable changes the results in lme4

Question

The title says it all: Changing the (supposedly arbitrary) labels of your random-effects grouping variable (e.g. the names of your subjects in a repeated-measures experiment) can change the resulting output in lme4. Minimal example:

require(dplyr)
require(lme4)
require(digest)
df = faithful %>% mutate(subject = rep(as.character(1:8), each = 34),
                         subject2 = rep(as.character(9:16), each = 34))
summary(lmer(eruptions ~ waiting + (waiting | subject), data = df))$coefficients[2,1] # = 0.07564181
summary(lmer(eruptions ~ waiting + (waiting | subject2), data = df))$coefficients[2,1] # = 0.07567655

I think it happens because lme4 converts them to a factor, and different names produce different factor level orderings. E.g. this produces the problem:

df2 = faithful %>% mutate(subject = factor(rep(as.character(1:8), each = 34)),
                          subject2 = factor(rep(as.character(9:16), each = 34)))
summary(lmer(eruptions ~ waiting + (waiting | subject), data = df2))$coefficients[2,1] # = 0.07564181
summary(lmer(eruptions ~ waiting + (waiting | subject2), data = df2))$coefficients[2,1] # = 0.07567655

But this doesn't:

df3 = faithful %>% mutate(subject = factor(rep(as.character(1:8), each = 34)),
                          subject2 = factor(rep(as.character(1:8), each = 34),
                                            levels = as.character(1:8),
                                            labels = as.character(9:16)))
summary(lmer(eruptions ~ waiting + (waiting | subject), data = df3))$coefficients[2,1] # = 0.07564181
summary(lmer(eruptions ~ waiting + (waiting | subject2), data = df3))$coefficients[2,1] # = 0.07564181

This seems like an issue in lme4. Different arbitrary variable labels shouldn't produce different output, right? Am I missing something? Why does lme4 do this?

(I know the difference in output is small, but I got bigger differences in other cases, enough to, e.g., change a p value from .055 to .045. Also, if this is right, I think it could cause slight reproducibility issues -- e.g. if, after finishing their analyses, an experimenter anonymizes their human subjects data (by changing the names) and then posts it in a public repository.)

jay.sf · Accepted Answer

The first part of your sequence 1:8 gives the same order in numeric or character format, whereas the second part doesn't:

identical(order(1:8), order(as.character(1:8)))
# [1] TRUE
identical(order(9:16), order(as.character(9:16)))
# [1] FALSE

That's because numbers as characters are sorted by their first digit:

sort(9:16)
# [1]  9 10 11 12 13 14 15 16
sort(as.character(9:16))
# [1] "10" "11" "12" "13" "14" "15" "16" "9"

So if you use two different but one-digit character sequences there is seemingly no issue:

library(lme4)
fo1 <- eruptions ~ waiting + (waiting | sub)
fo2 <- eruptions ~ waiting + (waiting | sub2)

df1 <- transform(faithful, sub=rep(as.character(1:8), each=34), 
                 sub2=rep(as.character(2:9), each=34))

summary(lmer(fo1, data=df1))$coe[2, 1]
# boundary (singular) fit: see ?isSingular
# [1] 0.07564181
summary(lmer(fo2, data=df1))$coe[2, 1]
# boundary (singular) fit: see ?isSingular
# [1] 0.07564181

However, the order of your grouping variables indeed do matter in lmer(). This can be shown by giving subject and subject2 the same levels but a different order:

set.seed(840947)
df2 <- transform(faithful, sub=rep(sample(1:8), each=34), sub2=rep(sample(1:8), each=34))

summary(fit2a <- lmer(fo1, data=df2))$coe[2, 1]
# boundary (singular) fit: see ?isSingular
# [1] 0.07564179
summary(fit2b <- lmer(fo2, data=df2))$coe[2, 1]
# boundary (singular) fit: see ?isSingular
# [1] 0.07567537

This yields completely different coefficients one more time. The levels and level orders may be inspected like so:

fit2a@flist$sub
# [1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
# [33] 4 4 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
# [65] 8 8 8 8 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
# [97] 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
# [129] 3 3 3 3 3 3 3 3 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
# [161] 6 6 6 6 6 6 6 6 6 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
# [193] 1 1 1 1 1 1 1 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
# [225] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
# [257] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
# Levels: 1 2 3 4 5 6 7 8

fit2b@flist$sub2
# [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
# [33] 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
# [65] 2 2 2 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
# [97] 6 6 6 6 6 6 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
# [129] 8 8 8 8 8 8 8 8 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
# [161] 7 7 7 7 7 7 7 7 7 7 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
# [193] 3 3 3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
# [225] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
# [257] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
# Levels: 1 2 3 4 5 6 7 8

There is already a ticket filed at github where you should join. Perhaps try to find a similar case beforehand where there is an ordering problem, but not a singular fit.

Changing the labels of your random-effects grouping variable changes the results in lme4

Tags:

r

lme4

mixed-models

Adam Morris

1 Answers

jay.sf

Recent Activity

Donate For Us

Changing the labels of your random-effects grouping variable changes the results in lme4

Tags:

r

lme4

mixed-models

Adam Morris

1 Answers

jay.sf

Related questions

Recent Activity

Donate For Us