Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a better reference for r formulas than ?formula?

Tags:

There are many redundant, and sometimes conflicting, ways of specifying formulae in R. Is there a comprehensive yet concise reference for mapping a conceptual models to R syntax than ?formula?

I am interested in a broad overview, including the syntax used to specify formulas in non-linear and hierarchical models such as glm, lmer, gam, earth, including (/) for nesting, random and fixed effects in mixed models, and s and te for splines, and others found in popular contributed packages.

like image 955
Abe Avatar asked Apr 30 '13 20:04

Abe


People also ask

What special symbol is used to create a formula in R?

Something that characterizes formulas in R is the tilde operator ~ .

What is a formula object in R?

A formula object has an associated environment, and this environment (rather than the parent environment) is used by model. frame to evaluate variables that are not found in the supplied data argument. Formulas created with the ~ operator use the environment in which they were created.


1 Answers

R comes with several manuals, which are accessible from vanilla R's "Help" menu at the top right when running R and are also in several places on-line.

Chapter 11 of "An Introduction to R" has a couple of pages on formulas, for example.

I don't know that it constitutes a "comprehensive" resource but it covers much* of what you need to know about how formulas work.

* Indeed, pretty much all of what perhaps 95% of users will ever use

The canonical reference to formulas in the S language might be

Chambers J.M., and Hastie T.J., eds. (1992), Statistical Models in S. Chapman & Hall, London.

though the origin of the approach comes from

Wilkinson G.N., and Rogers C.E. (1973). "Symbolic Description of Factorial Models for Analysis of Variance." Applied Statistics, 22, 392–399

A number of recent books related to R discuss formulas but I don't know that I'd call any of them comprehensive.

There are also numerous on-line resources (for example here) often with a good deal of very useful information.

That said, once you get comfortable with using formulas in R and so have a context into which more knowledge can be placed, the help page contains a surprising amount of information (along with other pages it links to). It is a bit terse and cryptic, but once you have the broader base of knowledge of R's particular way of working, it can be quite useful.

Specific questions relating to R formulas (depending on their content) are likely to be on topic either at StackOverflow or at CrossValidated - indeed there are some quite advanced questions relating to formulas to be found already (use of searches like [r] formula might be fruitful), and it would be handy to have more such questions to help users struggling with these issues; if you have specific questions I'd encourage you to ask.

As for 'redundant' and 'conflicting', I suppose you mean things like the fact that there is more than one way to specify a no-intercept model : y ~ . -1 and y ~ . +0 both work, for example, but in slightly different contexts each makes sense.

In addition, there's the common bugbear of having to isolate quadratic and higher order terms from the formula interface (to use I(x^2) as a predictor so it's passed through the formula interface unharmed and survives far enough to be interpreted as an algebraic expression). Again, once you get a picture of what's going on 'behind the scenes' that seems much less of a nuisance.

Specific examples of the things I just mentioned:

lm(dist ~ . -1, data=cars) # "remove-intercept-term" form of no-intercept lm(dist ~ . +0, data=cars) # "make-intercept-zero" form of no-intercept lm(dist ~ speed + speed^2, data=cars) # doesn't do what we want here lm(dist ~ speed + I(speed^2), data=cars) # gets us a quadratic term lm(dist ~ poly(speed,2), data=cars) # avoid potential multicollinearity 

I agree that the formula interface could at least use a little further guidance and better examples in the ?formula help.

like image 136
Glen_b Avatar answered Sep 19 '22 17:09

Glen_b