The title says it all: Changing the (supposedly arbitrary) labels of your random-effects grouping variable (e.g. the names of your subjects in a repeated-measures experiment) can change the resulting output in lme4. Minimal example:
require(dplyr)
require(lme4)
require(digest)
df = faithful %>% mutate(subject = rep(as.character(1:8), each = 34),
subject2 = rep(as.character(9:16), each = 34))
summary(lmer(eruptions ~ waiting + (waiting | subject), data = df))$coefficients[2,1] # = 0.07564181
summary(lmer(eruptions ~ waiting + (waiting | subject2), data = df))$coefficients[2,1] # = 0.07567655
I think it happens because lme4 converts them to a factor, and different names produce different factor level orderings. E.g. this produces the problem:
df2 = faithful %>% mutate(subject = factor(rep(as.character(1:8), each = 34)),
subject2 = factor(rep(as.character(9:16), each = 34)))
summary(lmer(eruptions ~ waiting + (waiting | subject), data = df2))$coefficients[2,1] # = 0.07564181
summary(lmer(eruptions ~ waiting + (waiting | subject2), data = df2))$coefficients[2,1] # = 0.07567655
But this doesn't:
df3 = faithful %>% mutate(subject = factor(rep(as.character(1:8), each = 34)),
subject2 = factor(rep(as.character(1:8), each = 34),
levels = as.character(1:8),
labels = as.character(9:16)))
summary(lmer(eruptions ~ waiting + (waiting | subject), data = df3))$coefficients[2,1] # = 0.07564181
summary(lmer(eruptions ~ waiting + (waiting | subject2), data = df3))$coefficients[2,1] # = 0.07564181
This seems like an issue in lme4. Different arbitrary variable labels shouldn't produce different output, right? Am I missing something? Why does lme4 do this?
(I know the difference in output is small, but I got bigger differences in other cases, enough to, e.g., change a p value from .055 to .045. Also, if this is right, I think it could cause slight reproducibility issues -- e.g. if, after finishing their analyses, an experimenter anonymizes their human subjects data (by changing the names) and then posts it in a public repository.)
The first part of your sequence 1:8 gives the same order in numeric or character format, whereas the second part doesn't:
identical(order(1:8), order(as.character(1:8)))
# [1] TRUE
identical(order(9:16), order(as.character(9:16)))
# [1] FALSE
That's because numbers as characters are sorted by their first digit:
sort(9:16)
# [1] 9 10 11 12 13 14 15 16
sort(as.character(9:16))
# [1] "10" "11" "12" "13" "14" "15" "16" "9"
So if you use two different but one-digit character sequences there is seemingly no issue:
library(lme4)
fo1 <- eruptions ~ waiting + (waiting | sub)
fo2 <- eruptions ~ waiting + (waiting | sub2)
df1 <- transform(faithful, sub=rep(as.character(1:8), each=34),
sub2=rep(as.character(2:9), each=34))
summary(lmer(fo1, data=df1))$coe[2, 1]
# boundary (singular) fit: see ?isSingular
# [1] 0.07564181
summary(lmer(fo2, data=df1))$coe[2, 1]
# boundary (singular) fit: see ?isSingular
# [1] 0.07564181
However, the order of your grouping variables indeed do matter in lmer(). This can be shown by giving subject and subject2 the same levels but a different order:
set.seed(840947)
df2 <- transform(faithful, sub=rep(sample(1:8), each=34), sub2=rep(sample(1:8), each=34))
summary(fit2a <- lmer(fo1, data=df2))$coe[2, 1]
# boundary (singular) fit: see ?isSingular
# [1] 0.07564179
summary(fit2b <- lmer(fo2, data=df2))$coe[2, 1]
# boundary (singular) fit: see ?isSingular
# [1] 0.07567537
This yields completely different coefficients one more time. The levels and level orders may be inspected like so:
fit2a@flist$sub
# [1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
# [33] 4 4 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
# [65] 8 8 8 8 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
# [97] 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
# [129] 3 3 3 3 3 3 3 3 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
# [161] 6 6 6 6 6 6 6 6 6 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
# [193] 1 1 1 1 1 1 1 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
# [225] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
# [257] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
# Levels: 1 2 3 4 5 6 7 8
fit2b@flist$sub2
# [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
# [33] 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
# [65] 2 2 2 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
# [97] 6 6 6 6 6 6 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
# [129] 8 8 8 8 8 8 8 8 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
# [161] 7 7 7 7 7 7 7 7 7 7 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
# [193] 3 3 3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
# [225] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
# [257] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
# Levels: 1 2 3 4 5 6 7 8
There is already a ticket filed at github where you should join. Perhaps try to find a similar case beforehand where there is an ordering problem, but not a singular fit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With