Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does a R multi-part formula mean in mathematical terms?

Tags:

math

r

formula

In R Formula package, it introduces notions for multipart formula like y ~ x1 + x2|I(x1^2). What's this formula mean mathematically? How's this different from y ~ x1 + x2 + I(x1^2) or two independent y ~ x1 + x2 and y ~ I(x1^2)?

like image 912
user914532 Avatar asked Dec 08 '10 20:12

user914532


1 Answers

You seem to misunderstand what the Formula package is for. The multipart formulas can be used to mean whatever you as the user/developer want them to mean. Formula provides the syntactic sugar around the more flexible formula notation provisioned by the package. The multipart formulas don't mean anything until you process the formula to convert the symbolic representation into model matrices or similar.

The example you quote in your follow-on "Answer" is y ~ x1 + X2 | z1 +z2 + z3. This is for an instrumental variables model fitted by two-stage OLS. The part after the | (z1 +z2 + z3) is then interpreted by the ivcoef() function as the IVs, whilst the part to the left of the | (x1 + x2) is interpreted as the regression covariates. ivcoef() builds two model matrices from these parts of the RHS of the formula to enable it to fit the two-stage OLS. Formula provides the code to handle and manipulate these multipart formulas, it doesn't specify what statistical models they are used to represent.

Another example is the hurdle() function in package pscl, which uses the Formula functionality. In these models, the same formula y ~ x1 + X2 | z1 +z2 + z3 would be interpreted differently; namely the z1 +z2 + z3 bit would be used for the zero hurdle (the binomial part of the hurdle model), whilst the x1 + X2 would be interpreted and used for the count part of the hurdle model.

My point is, the Formula can be interpreted however you wish if you are building the software. If you are the user, you need to understand the model being fitted before you can interpret the multipart Formula in terms of the statistical model. As such there isn't an answer to your Q; there is no one meaning in mathematical terms for a multipart Formula.

like image 134
Gavin Simpson Avatar answered Nov 15 '22 19:11

Gavin Simpson