Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the R formula y~1 mean?

Tags:

r

r-formula

I was reading the documentation on R Formula, and trying to figure out how to work with depmix (from the depmixS4 package).

Now, in the documentation of depmixS4, sample formula tends to be something like y ~ 1. For simple case like y ~ x, it is defining a relationship between input x and output y, so I get that it is similar to y = a * x + b, where a is the slope, and b is the intercept.

If we go back to y ~ 1, the formula is throwing me off. Is it equivalent to y = 1 (a horizontal line at y = 1)?

To add a bit context, if you look at the depmixs4 documentation, there is one example below

depmix(list(rt~1,corr~1),data=speed,nstates=2,family=list(gaussian(),multinomial()))

I think in general, formula that end with ~ 1 is confusing to me. Can any explain what ~ 1 or y ~ 1 mean?

like image 749
Antony Avatar asked Nov 13 '12 18:11

Antony


People also ask

What does R formula mean?

The R method is most often used to find the extrema (maximum and minimum) of combinations of trigonometric functions, since the extrema of a basic trigonometric function are easy to work with (both sine and cosine have a minimum of -1 and a maximum of 1).

What does R mean in a linear model?

In the context of simple linear regression: R: The correlation between the predictor variable, x, and the response variable, y. R2: The proportion of the variance in the response variable that can be explained by the predictor variable in the regression model.

What does i () mean in R?

I isolates or insulates the contents of I( ... ) from the gaze of R's formula parsing code. It allows the standard R operators to work as they would if you used them outside of a formula, rather than being treated as special formula operators.


2 Answers

Many of the operators used in model formulae (asterix, plus, caret) in R, have a model-specific meaning and this is one of them: the 'one' symbol indicates an intercept.

In other words, it is the value the dependent variable is expected to have when the independent variables are zero or have no influence. (To use the more common mathematical meaning of model terms, you wrap them in I()). Intercepts are usually assumed so it is most common to see it in the context of explicitly stating a model without an intercept.

Here are two ways of specifying the same model for a linear regression model of y on x. The first has an implicit intercept term, and the second an explicit one:

y ~ x y ~ 1 + x 

Here are ways to give a linear regression of y on x through the origin (that is, without an intercept term):

y ~ 0 + x y ~ -1 + x y ~ x - 1 

In the specific case you mention ( y ~ 1 ), y is being predicted by no other variable so the natural prediction is the mean of y, as Paul Hiemstra stated:

> data(city) > r <- lm(x~1, data=city) > r  Call: lm(formula = x ~ 1, data = city)  Coefficients: (Intercept)          97.3    > mean(city$x) [1] 97.3 

And removing the intercept with a -1 leaves you with nothing:

> r <- lm(x ~ -1, data=city) > r  Call: lm(formula = x ~ -1, data = city)  No coefficients 

formula() is a function for extracting formula out of objects and its help file isn't the best place to read about specifying model formulae in R. I suggest you look at this explanation or Chapter 11 of An Introduction to R.

like image 133
MattBagg Avatar answered Oct 20 '22 10:10

MattBagg


if your model were of the form y ~ x1 + x2 This (roughly speaking) represents:

 y = β0 + β1(x1) + β2(x2)   Which is of course the same as   y = β0(1) + β1(x1) + β2(x2) 

There is an implicit +1 in the above formula. So really, the formula above is y ~ 1 + x1 + x2

We could have a very simple formula, whereby y is not dependent on any other variable. This is the formula that you are referencing, y ~ 1 which roughly would equate to

 y = β0(1) = β0 

As @Paul points out, when you solve the simple model, you get β0 = mean (y)




Here is an example

  # Let's make a small sample data frame   dat <- data.frame(y= (-2):3, x=3:8)    # Create the linear model as above   simpleModel <- lm(y ~ 1, data=dat)    ## COMPARE THE COEFFICIENTS OF THE MODEL TO THE MEAN(y)   simpleModel$coef     # (Intercept)      #         0.5     mean(dat$y)     # [1] 0.5 
like image 31
Ricardo Saporta Avatar answered Oct 20 '22 12:10

Ricardo Saporta