Ok everyone, once and for all, how do you (emphasis on you, because I'm sure there is more than one way to achieve this) contrast code (treatment, sum, helmert, etc.) and retain a meaningful factor label (so you can make meaningful interpretations of effects) in the glm function?
I understand I can use level() to understand which factor level is the reference, but that get's tedious when I begin to involve factors with 5 or 10 levels and their interactions.
Here is a quick two factor example of what I mean
outcome <- c(1,0,0,1,1,0,0,0,1, 0, 0, 1)
firstvar <- c("A", "B", "C", "C", "B", "B", "A", "A", "C", "A", "C", "B")
secondvar <- c("D", "D", "E", "F", "F", "E", "D", "E", "F", "F", "D", "E")
df <- as.data.frame(cbind(outcome, firstvar, secondvar))
df$firstvar <- as.factor(df$firstvar)
df$secondvar <- as.factor(df$secondvar)
#not coded manually (and default appears to be dummy or treatment coding)
#gives meaningful factor labels in summary function
summary(glm(outcome ~ firstvar*secondvar, data=df, family="binomial"))
#effects coded
#does not give meaningful factor labels
contrasts(df$firstvar)=contr.sum(3)
contrasts(df$secondvar)=contr.sum(3)
summary(glm(outcome ~ firstvar*secondvar, data=df, family="binomial"))
#dummy coded
contrasts(df$firstvar)=contr.treatment(3);
contrasts(df$secondvar)=contr.treatment(3);
summary(glm(outcome ~ firstvar*secondvar, data=df, family="binomial"))
Any and all suggestions will be appreciated. This problem has bugged me for a while, and I'm sure there is a simple(ish) solution.
In order to set a contrast in R, you can use the contr. _X_() function for treatment, sum, and Helmert contrasts, or define any contrast manually. Be aware that this changes your dataset. You might want to consider creating a new variable as a copy of your original one, and set the contrasts on that variable.
The idea behind Helmert contrasts is to compare each group to the mean of the “previous” ones. That is, the first contrast represents the difference between group 2 and group 1, the second contrast represents the difference between group 3 and the mean of groups 1 and 2, and so on.
Contrast coding allows for recentering of categorical variables such that the intercept of a model is not the mean of one level of a category, but instead the mean of all data points in the data set.
Contrasts are used to test for differences among the levels of a factor. You can specify a contrast for each factor in the model. Contrasts represent linear combinations of the parameters.
Well, the simple answer (for contr.treatment
at least), is that you should pass the factor levels to the function, rather than just the total count. In most cases this will set the level names correctly. For example
contr.treatment(levels(df$firstvar))
# B C
# A 0 0
# B 1 0
# C 0 1
and then R uses the column names as labels/suffixes on the coefficients in the regression summary. However, even when passing labels, contr.sum
doesn't like to set column names. Here we can create our own wrapper though.
named.contr.sum<-function(x, ...) {
if (is.factor(x)) {
x <- levels(x)
} else if (is.numeric(x) & length(x)==1L) {
stop("cannot create names with integer value. Pass factor levels")
}
x<-contr.sum(x, ...)
colnames(x) <- apply(x,2,function(x)
paste(names(x[x>0]), names(x[x<0]), sep="-")
)
x
}
Here we are basically calling calling contr.sum
and just adding column names to the result (plus some error checking). You can run this with
named.contr.sum(levels(df$firstvar))
# A-C B-C
# A 1 0
# B 0 1
# C -1 -1
I decided to use "A-C" and "B-C" as labels, but you could change that in the code if you like. Then running
contrasts(df$firstvar)=named.contr.sum(levels(df$firstvar))
contrasts(df$secondvar)=named.contr.sum(levels(df$secondvar))
summary(glm(outcome ~ firstvar*secondvar, data=df, family="binomial"))
will give you
Call:
glm(formula = outcome ~ firstvar * secondvar, family = "binomial",
data = df)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.855e+00 5.023e+03 -0.001 0.999
firstvarA-C -6.855e+00 6.965e+03 -0.001 0.999
firstvarB-C 6.855e+00 6.965e+03 0.001 0.999
secondvarD-F -6.855e+00 6.965e+03 -0.001 0.999
secondvarE-F -6.855e+00 6.965e+03 -0.001 0.999
firstvarA-C:secondvarD-F 2.057e+01 8.473e+03 0.002 0.998
firstvarB-C:secondvarD-F -1.371e+01 1.033e+04 -0.001 0.999
firstvarA-C:secondvarE-F 7.072e-10 1.033e+04 0.000 1.000
firstvarB-C:secondvarE-F 6.855e+00 8.473e+03 0.001 0.999
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With