Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Significance of 'I' keyword in lm model in R [duplicate]

Tags:

r

I was creating a linear model for my assignment :

lm(revenue ~ (max_cpc - max_cpc.mean), data = traffic)  

But it throws:

Error in model.frame.default(formula = revenue ~ (max_cpc - max_cpc.mean),  : 
   variable lengths differ (found for 'maxcpc.mean') 

Then, through trial and error, I slightly modified my code :

lm(revenue ~ I(max_cpc - max_cpc.mean), data = traffic)

and Bingo!!!It worked well.

But now I am trying to figure out the significance of 'I' and how it fixed my problem. Can anyone explain it to me?

like image 417
heybhai Avatar asked Dec 26 '22 08:12

heybhai


1 Answers

I() prevents the formula-interface from interpreting the argument, so it gets passed along instead to the expression-parsing part.

In the formula interface -x means 'remove x from the predictors'. So I can do y~.-x to mean 'fit y against everything but x'.

You don't want it to do that - you actually want to make a variable that is the difference of two variables and regress on that, so you don't want the formula interface to parse that expression.

I() achieves that for you.

Terms with squaring in them (x^2) also need the same treatment. The formula interface does something special with powers, and if you actually want a variable squared you have to I() it.

I() has some other uses in other contexts as well. See ?I

like image 186
Glen_b Avatar answered Dec 29 '22 00:12

Glen_b