Significance of 'I' keyword in lm model in R [duplicate]

Question

I was creating a linear model for my assignment :

lm(revenue ~ (max_cpc - max_cpc.mean), data = traffic)

But it throws:

Error in model.frame.default(formula = revenue ~ (max_cpc - max_cpc.mean),  : 
   variable lengths differ (found for 'maxcpc.mean')

Then, through trial and error, I slightly modified my code :

lm(revenue ~ I(max_cpc - max_cpc.mean), data = traffic)

and Bingo!!!It worked well.

But now I am trying to figure out the significance of 'I' and how it fixed my problem. Can anyone explain it to me?

Glen_b · Accepted Answer

I() prevents the formula-interface from interpreting the argument, so it gets passed along instead to the expression-parsing part.

In the formula interface -x means 'remove x from the predictors'. So I can do y~.-x to mean 'fit y against everything but x'.

You don't want it to do that - you actually want to make a variable that is the difference of two variables and regress on that, so you don't want the formula interface to parse that expression.

I() achieves that for you.

Terms with squaring in them (x^2) also need the same treatment. The formula interface does something special with powers, and if you actually want a variable squared you have to I() it.

I() has some other uses in other contexts as well. See ?I

Significance of 'I' keyword in lm model in R [duplicate]

Tags:

r

heybhai

1 Answers

Glen_b

Recent Activity

Donate For Us

Significance of 'I' keyword in lm model in R [duplicate]

Tags:

r

heybhai

1 Answers

Glen_b

Related questions

Recent Activity

Donate For Us