Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R formulas and resulting coefficient names

Tags:

r

formula

In the following example, say you have a model where supp is a factor variable.

lm(len ~ dose + supp, data = ToothGrowth)

but I want to use different base level for the factor. I could specify this directly in the formula:

lm(len ~ dose + relevel(supp, "VC"), data = ToothGrowth)

and the output would be:

Call:
lm(formula = len ~ dose + relevel(supp, "VC"), data = ToothGrowth)

Coefficients:
      (Intercept)                   dose  relevel(supp, "VC")OJ  
            5.573                  9.764                  3.700 

It is very convenient to do transformations directly in the formula, and not make intermediate data sets or alter the existing one. An example is when you use scale to standardize variables where it is essential to account for missings in other variables included in the final model. Often, however the resulting coefficient names in the output becomes quite ugly.

My question is: is it possible to specify the name of a variable which is resulting from an expression when working with the formula? Something like

lm(len ~ dose + (OJ = relevel(supp, "VC")), data = Toothgrowth)

(which does not work).

EDIT: While the solution proposed by G. Grothendieck is nice it actually produces the wrong result. The following example shows this:

# Create some data:
df <- data.frame(x1 = runif(10), x2=runif(10))
df <- transform(df,   y = x1 + x2 + rnorm(10))

# Introduce some missings.
df$x1[2:3] <- NA

# The wrong result:
lm(formula = y ~ z1 + z2, 
   data    = transform(df, z1 = scale(x1), z2=scale(x2)))

# extract a model frame.
df2 <- model.frame(y ~ x1 + x2, df)

# The right result:
lm(formula = y ~ scale(x1) + scale(x2), 
   data    = df2)

# or:
lm(formula = y ~ z1 + z2, 
   data    = transform(model.frame(y ~ x1 + x2, df), 
             z1 = scale(x1), z2 = scale(x2)))

The issue is that when demeaning x2, it uses observations that are not in the final model because x1 has missings.

So to me the question remains, whether there is a way for the formula interface to handle this case without having the annoying intermediate step of using an extra formula and extracting a model frame, which can then be "transformed".

I hope the question is clear.

like image 338
Stefan Avatar asked Mar 06 '12 14:03

Stefan


1 Answers

Modify it in the data= argument rather than in the formula= argument:

lm(len ~ dose + OJ, data = transform(ToothGrowth, OJ = relevel(supp, "VC")))
like image 127
G. Grothendieck Avatar answered Sep 26 '22 07:09

G. Grothendieck