Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R tilde operator: What does ~0+a means?

Tags:

r

I have seen how to use ~ operator in formula. For example y~x means: y is distributed as x.

However I am really confused of what does ~0+a means in this code:

require(limma)
a = factor(1:3)
model.matrix(~0+a)

Why just model.matrix(a) does not work? Why the result of model.matrix(~a) is different from model.matrix(~0+a)? And finally what is the meaning of ~ operator here?

like image 259
Ali Avatar asked Oct 05 '12 00:10

Ali


1 Answers

~ creates a formula - it separates the righthand and lefthand sides of a formula

From ?`~`

Tilde is used to separate the left- and right-hand sides in model formula

Quoting from the help for formula

The models fit by, e.g., the lm and glm functions are specified in a compact symbolic form. The ~ operator is basic in the formation of such models. An expression of the form y ~ model is interpreted as a specification that the response y is modelled by a linear predictor specified symbolically by model. Such a model consists of a series of terms separated by + operators. The terms themselves consist of variable and factor names separated by : operators. Such a term is interpreted as the interaction of all the variables and factors appearing in the term.

In addition to + and :, a number of other operators are useful in model formulae. The * operator denotes factor crossing: a*b interpreted as a+b+a:b. The ^ operator indicates crossing to the specified degree. For example (a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turn expands to a formula containing the main effects for a, b and c together with their second-order interactions. The %in% operator indicates that the terms on its left are nested within those on the right. For example a + b %in% a expands to the formula a + a:b. The - operator removes the specified terms, so that (a+b+c)^2 - a:b is identical to a + b + c + b:c + a:c. It can also used to remove the intercept term: when fitting a linear model y ~ x - 1 specifies a line through the origin. A model with no intercept can be also specified as y ~ x + 0 or y ~ 0 + x.

So regarding specific issue with ~a+0

  • You creating a model matrix without an intercept. As a is a factor, model.matrix(~a) will return an intercept column which is a1 (You need n-1 indicators to fully specify n classes)

The help files for each function are well written, detailed and easy to find!

why doesn't model.matrix(a) work

model.matrix(a) doesn't work because a is a factor variable, not a formula or terms object

From the help for model.matrix

object an object of an appropriate class. For the default method, a model formula or a terms object.

R is looking for a particular class of object, by passing a formula ~a you are passing an object that is of class formula. model.matrix(terms(~a)) would also work, (passing the terms object corresponding to the formula ~a


general note

@BenBolker helpfully notes in his comment, This is a modified version of Wilkinson-Rogers notation.

There is a good description in the Introduction to R.

like image 190
mnel Avatar answered Sep 26 '22 14:09

mnel