Custom contrasts are very widely used in analyses, e.g.: "Do DV values at level 1 and level 3 of this three-level factor differ significantly?"
Intuitively, this contrast is expressed in terms of cell means as:
c(1,0,-1)
One or more of these contrasts, bound as columns, form a contrast coefficient matrix, e.g.
mat = matrix(ncol = 2, byrow = TRUE, data = c(
1, 0,
0, 1,
-1, -1)
)
[,1] [,2]
[1,] 1 0
[2,] 0 1
[3,] -1 -1
However, when it comes to running these contrasts specified by the coefficient matrix, there is a lot of (apparently contradictory) information on the web and in books. My question is which information is correct?
In some examples, the user is shown that the intuitive contrast coefficient matrix can be used directly via the contrasts()
or C()
functions. So it's as simple as:
contrasts(myFactor) <- mat
Elsewhere (e.g. UCLA stats) we are told the coefficient matrix (or basis matrix) must be transformed from a coefficient matrix into a contrast matrix before use. This involves taking the inverse of the transform of the coefficient matrix: (mat')⁻¹
, or, in Rish:
contrasts(myFactor) = solve(t(mat))
This method requires padding the matrix with an initial column of means for the intercept. To avoid this, some sites recommend using a generalized inverse function which can cope with non-square matrices, i.e., MASS::ginv()
contrasts(myFactor) = ginv(t(mat))
Elsewhere again (e.g. a note from SPSS support), we learn the correct algebra is: (mat'mat)-¹ mat'
Implying to me that the correct way to create the contrasts matrix should be:
x = solve(t(mat)%*% mat)%*% t(mat)
[,1] [,2] [,3]
[1,] 0 0 1
[2,] 1 0 -1
[3,] 0 1 -1
contrasts(myFactor) = x
My question is, which is right? (If I am interpreting and describing each piece of advice accurately). How does one specify custom contrasts in R for lm
, lme
etc?
Refs
Claim 2 is correct (see the answers here and here) and sometimes claim 1, too. This is because there are cases in which the generalized inverse of the (transposed) coefficient matrix is equal to the matrix itself.
For what it's worth....
If you have a factor with 3 levels (levels A, B, and C) and you want to test the following orthogonal contrasts: A vs B, and the avg. of A and B vs C, your contrast codes would be:
Cont1<- c(1,-1, 0)
Cont2<- c(.5,.5, -1)
If you do as directed on the UCLA site (transform coefficients to make a coding scheme), as such:
Contrasts(Variable)<- solve(t(cbind(c(1,1,1), Cont1, Cont2)))[,2:3]
then your results are IDENTICAL to if you had created two dummy variables (e.g.:
Dummy1<- ifelse(Variable=="A", 1, ifelse(Variable=="B", -1, 0))
Dummy2<- ifelse(Variable=="A", .5, ifelse(Variable=="B", .5, -1))
and entered them both into the regression equation instead of your factor, which makes me inclined to think that this is the correct way.
PS I don't write the most elegant R code, but it gets the job done. Sorry, I'm sure there are easier ways to recode variables, but you get the gist.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With