I was creating a linear model for my assignment :
lm(revenue ~ (max_cpc - max_cpc.mean), data = traffic)
But it throws:
Error in model.frame.default(formula = revenue ~ (max_cpc - max_cpc.mean), :
variable lengths differ (found for 'maxcpc.mean')
Then, through trial and error, I slightly modified my code :
lm(revenue ~ I(max_cpc - max_cpc.mean), data = traffic)
and Bingo!!!It worked well.
But now I am trying to figure out the significance of 'I' and how it fixed my problem. Can anyone explain it to me?
I()
prevents the formula-interface from interpreting the argument, so it gets passed along instead to the expression-parsing part.
In the formula interface -x
means 'remove x from the predictors'. So I can do y~.-x
to mean 'fit y against everything but x'.
You don't want it to do that - you actually want to make a variable that is the difference of two variables and regress on that, so you don't want the formula interface to parse that expression.
I()
achieves that for you.
Terms with squaring in them (x^2
) also need the same treatment. The formula interface does something special with powers, and if you actually want a variable squared you have to I()
it.
I()
has some other uses in other contexts as well. See ?I
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With