What does the capital letter "I" in R linear regression formula mean?

Tags:

I haven't been able to find an answer to this question, largely because googling anything with a standalone letter (like "I") causes issues.

What does the "I" do in a model like this?

data(rock) lm(area~I(peri - mean(peri)), data = rock)

Considering that the following does NOT work:

lm(area ~ (peri - mean(peri)), data = rock)

and that this does work:

rock$peri - mean(rock$peri)

Any key words on how to research this myself would also be very helpful.

997

asked Jun 12 '14 19:06

Nancy

1 Answers

I isolates or insulates the contents of I( ... ) from the gaze of R's formula parsing code. It allows the standard R operators to work as they would if you used them outside of a formula, rather than being treated as special formula operators.

For example:

y ~ x + x^2

would, to R, mean "give me:

x = the main effect of x, and
x^2 = the main effect and the second order interaction of x",

not the intended x plus x-squared:

> model.frame( y ~ x + x^2, data = data.frame(x = rnorm(5), y = rnorm(5)))            y           x 1 -1.4355144 -1.85374045 2  0.3620872 -0.07794607 3 -1.7590868  0.96856634 4 -0.3245440  0.18492596 5 -0.6515630 -1.37994358

This is because ^ is a special operator in a formula, as described in ?formula. You end up only including x in the model frame because the main effect of x is already included from the x term in the formula, and there is nothing to cross x with to get the second-order interactions in the x^2 term.

To get the usual operator, you need to use I() to isolate the call from the formula code:

> model.frame( y ~ x + I(x^2), data = data.frame(x = rnorm(5), y = rnorm(5)))             y          x       I(x^2) 1 -0.02881534  1.0865514 1.180593.... 2  0.23252515 -0.7625449 0.581474.... 3 -0.30120868 -0.8286625 0.686681.... 4 -0.67761458  0.8344739 0.696346.... 5  0.65522764 -0.9676520 0.936350....

(that last column is correct, it just looks odd because it is of class AsIs.)

In your example, - when used in a formula would indicate removal of a term from the model, where you wanted - to have it's usual binary operator meaning of subtraction:

> model.frame( y ~ x - mean(x), data = data.frame(x = rnorm(5), y = rnorm(5))) Error in model.frame.default(y ~ x - mean(x), data = data.frame(x = rnorm(5),  :    variable lengths differ (found for 'mean(x)')

This fails for reason that mean(x) is a length 1 vector and model.frame() quite rightly tells you this doesn't match the length of the other variables. A way round this is I():

> model.frame( y ~ I(x - mean(x)), data = data.frame(x = rnorm(5), y = rnorm(5)))            y I(x - mean(x)) 1  1.1727063   1.142200.... 2 -1.4798270   -0.66914.... 3 -0.4303878   -0.28716.... 4 -1.0516386   0.542774.... 5  1.5225863   -0.72865....

Hence, where you want to use an operator that has special meaning in a formula, but you need its non-formula meaning, you need to wrap the elements of the operation in I( ).

Read ?formula for more on the special operators, and ?I for more details on the function itself and its other main use-case within data frames (which is where the AsIs bit originates from, if you are interested).

answered Sep 20 '22 10:09

Gavin Simpson

Related questions
                            
                                R: lookaround within lookaround
                            
                                point size in ggplot 2.0.0
                            
                                Execute two commands sequentially on one line in R?
                            
                                Matrix power in R
                            
                                change the default colour palette in ggplot
                            
                                How to label a barplot bar with positive and negative bars with ggplot2
                            
                                How to append a plot to an existing pdf file
                            
                                roxygen2 manually insert line breaks
                            
                                Is there a command similar to Matlab's "close all" in R?
                            
                                Reading a pickle file (PANDAS Python Data Frame) in R
                            
                                Building RESTful API using R [closed]
                            
                                Best way to allocate matrix in R, NULL vs NA?
                            
                                How to get coefficients and their confidence intervals in mixed effects models?
                            
                                Circular Heatmap that looks like a donut
                            
                                How to add documentation to a data.frame in R?
                            
                                List of Defined Variables in R
                            
                                Hyperlinking text in a ggplot2 visualization
                            
                                Create a PDF table
                            
                                Split up a dataframe by number of rows
                            
                                Reversed order after coord_flip in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What does the capital letter "I" in R linear regression formula mean?

Tags:

r

formula

regression

polynomials

Nancy

People also ask

1 Answers

Gavin Simpson

Recent Activity

Donate For Us