In the following example, say you have a model where supp
is a factor variable.
lm(len ~ dose + supp, data = ToothGrowth)
but I want to use different base level for the factor. I could specify this directly in the formula:
lm(len ~ dose + relevel(supp, "VC"), data = ToothGrowth)
and the output would be:
Call:
lm(formula = len ~ dose + relevel(supp, "VC"), data = ToothGrowth)
Coefficients:
(Intercept) dose relevel(supp, "VC")OJ
5.573 9.764 3.700
It is very convenient to do transformations directly in the formula, and not make intermediate data sets or alter the existing one. An example is when you use scale
to standardize variables where it is essential to account for missings in other variables included in the final model. Often, however the resulting coefficient names in the output becomes quite ugly.
My question is: is it possible to specify the name of a variable which is resulting from an expression when working with the formula? Something like
lm(len ~ dose + (OJ = relevel(supp, "VC")), data = Toothgrowth)
(which does not work).
EDIT: While the solution proposed by G. Grothendieck is nice it actually produces the wrong result. The following example shows this:
# Create some data:
df <- data.frame(x1 = runif(10), x2=runif(10))
df <- transform(df, y = x1 + x2 + rnorm(10))
# Introduce some missings.
df$x1[2:3] <- NA
# The wrong result:
lm(formula = y ~ z1 + z2,
data = transform(df, z1 = scale(x1), z2=scale(x2)))
# extract a model frame.
df2 <- model.frame(y ~ x1 + x2, df)
# The right result:
lm(formula = y ~ scale(x1) + scale(x2),
data = df2)
# or:
lm(formula = y ~ z1 + z2,
data = transform(model.frame(y ~ x1 + x2, df),
z1 = scale(x1), z2 = scale(x2)))
The issue is that when demeaning x2, it uses observations that are not in the final model because x1 has missings.
So to me the question remains, whether there is a way for the formula interface to handle this case without having the annoying intermediate step of using an extra formula and extracting a model frame, which can then be "transformed".
I hope the question is clear.
Modify it in the data=
argument rather than in the formula=
argument:
lm(len ~ dose + OJ, data = transform(ToothGrowth, OJ = relevel(supp, "VC")))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With