I want to calculate the marginal means in a glm model with discrete predictors and unbalanced data. Using function emmeans of emmeans package to get the marginal means gives me different results for setting weights="cell" and weights="proportional". The package documentation says "proportional" uses weight in proportion to the frequencies (in the original data) of the factor combinations that are averaged over and "cells" uses weight according to the frequencies of the cells being averaged. But I do not understand what it really means?! Please see a simplified version of my r code below!
I would be appriciated for any help.
model <- glm(formula=y~x1+x2, data=df, family=gaussian)
library(emmeans)
marginal_means_cells <- summary(emmeans(model, "x1", weights="cells"))
marginal_means_prop <- summary(emmeans(model, "x1", weights="prop"))
This is further discussed in the "messy data" vignette in the emmeans package. Suppose in a 2-factor design, you have cell frequencies like this:
Treatment
Dose | A B C | sum
------------------------------
a | 5 0 3 | 8
b | 11 3 9 | 23
------------------------------
sum | 16 3 12 | 31
Then using "prop" weights, the marginal means for Treatment will all be computed with weights 8 and 23, and the marginal means for Dose will all be computed with weights 16, 3, and 12.
Using "cells" weights, the marginal mean for Treatment are computed with weights 5 and 11 for A, 0 and 3 for B, and 3 and 9 for C; and the marginal means for Dose are computed with weights 5, 0, 3 for a, and 11, 3, 9 for b.
Ifthe model contains the Treatment:Dose interaction, the cell with 0 observations cannot even be estimated, and hence using "prop" weights, neither can the marginal means for Treatment B or Dose a. But those marginal means can be estimated with "cells" weights.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With