Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Custom contrasts in R: contrast coefficient matrix or contrast matrix / coding scheme? And how to get there?

Tags:

r

matrix

anova

Custom contrasts are very widely used in analyses, e.g.: "Do DV values at level 1 and level 3 of this three-level factor differ significantly?"

Intuitively, this contrast is expressed in terms of cell means as:

c(1,0,-1)

One or more of these contrasts, bound as columns, form a contrast coefficient matrix, e.g.

mat = matrix(ncol = 2, byrow = TRUE, data = c(
    1,  0,
    0,  1,
   -1, -1)
)
     [,1] [,2]
[1,]    1    0
[2,]    0    1
[3,]   -1   -1

However, when it comes to running these contrasts specified by the coefficient matrix, there is a lot of (apparently contradictory) information on the web and in books. My question is which information is correct?

Claim 1: contrasts(factor) takes a coefficient matrix

In some examples, the user is shown that the intuitive contrast coefficient matrix can be used directly via the contrasts() or C() functions. So it's as simple as:

contrasts(myFactor) <- mat

Claim 2: Transform coefficients to create a coding scheme

Elsewhere (e.g. UCLA stats) we are told the coefficient matrix (or basis matrix) must be transformed from a coefficient matrix into a contrast matrix before use. This involves taking the inverse of the transform of the coefficient matrix: (mat')⁻¹, or, in Rish:

contrasts(myFactor) = solve(t(mat))

This method requires padding the matrix with an initial column of means for the intercept. To avoid this, some sites recommend using a generalized inverse function which can cope with non-square matrices, i.e., MASS::ginv()

contrasts(myFactor) = ginv(t(mat))

Third option: premultiply by the transform, take the inverse, and post multiply by the transform

Elsewhere again (e.g. a note from SPSS support), we learn the correct algebra is: (mat'mat)-¹ mat'

Implying to me that the correct way to create the contrasts matrix should be:

x = solve(t(mat)%*% mat)%*% t(mat)
     [,1] [,2] [,3]
[1,]    0    0    1
[2,]    1    0   -1
[3,]    0    1   -1

contrasts(myFactor) = x

My question is, which is right? (If I am interpreting and describing each piece of advice accurately). How does one specify custom contrasts in R for lm, lme etc?

Refs

like image 851
tim Avatar asked Aug 04 '15 19:08

tim


2 Answers

Claim 2 is correct (see the answers here and here) and sometimes claim 1, too. This is because there are cases in which the generalized inverse of the (transposed) coefficient matrix is equal to the matrix itself.

like image 89
statmerkur Avatar answered Oct 02 '22 20:10

statmerkur


For what it's worth....

If you have a factor with 3 levels (levels A, B, and C) and you want to test the following orthogonal contrasts: A vs B, and the avg. of A and B vs C, your contrast codes would be:

Cont1<- c(1,-1, 0)
Cont2<- c(.5,.5, -1)

If you do as directed on the UCLA site (transform coefficients to make a coding scheme), as such:

Contrasts(Variable)<- solve(t(cbind(c(1,1,1), Cont1, Cont2)))[,2:3]

then your results are IDENTICAL to if you had created two dummy variables (e.g.:

Dummy1<- ifelse(Variable=="A", 1, ifelse(Variable=="B", -1, 0))
Dummy2<- ifelse(Variable=="A", .5, ifelse(Variable=="B", .5, -1))

and entered them both into the regression equation instead of your factor, which makes me inclined to think that this is the correct way.

PS I don't write the most elegant R code, but it gets the job done. Sorry, I'm sure there are easier ways to recode variables, but you get the gist.

like image 40
Liz Avatar answered Oct 02 '22 18:10

Liz