I was reading the documentation on R Formula, and trying to figure out how to work with depmix (from the depmixS4 package).
Now, in the documentation of depmixS4, sample formula tends to be something like y ~ 1
. For simple case like y ~ x
, it is defining a relationship between input x and output y, so I get that it is similar to y = a * x + b
, where a
is the slope, and b
is the intercept.
If we go back to y ~ 1
, the formula is throwing me off. Is it equivalent to y = 1
(a horizontal line at y = 1)?
To add a bit context, if you look at the depmixs4 documentation, there is one example below
depmix(list(rt~1,corr~1),data=speed,nstates=2,family=list(gaussian(),multinomial()))
I think in general, formula that end with ~ 1
is confusing to me. Can any explain what ~ 1
or y ~ 1
mean?
The R method is most often used to find the extrema (maximum and minimum) of combinations of trigonometric functions, since the extrema of a basic trigonometric function are easy to work with (both sine and cosine have a minimum of -1 and a maximum of 1).
In the context of simple linear regression: R: The correlation between the predictor variable, x, and the response variable, y. R2: The proportion of the variance in the response variable that can be explained by the predictor variable in the regression model.
I isolates or insulates the contents of I( ... ) from the gaze of R's formula parsing code. It allows the standard R operators to work as they would if you used them outside of a formula, rather than being treated as special formula operators.
Many of the operators used in model formulae (asterix, plus, caret) in R, have a model-specific meaning and this is one of them: the 'one' symbol indicates an intercept.
In other words, it is the value the dependent variable is expected to have when the independent variables are zero or have no influence. (To use the more common mathematical meaning of model terms, you wrap them in I()
). Intercepts are usually assumed so it is most common to see it in the context of explicitly stating a model without an intercept.
Here are two ways of specifying the same model for a linear regression model of y on x. The first has an implicit intercept term, and the second an explicit one:
y ~ x y ~ 1 + x
Here are ways to give a linear regression of y on x through the origin (that is, without an intercept term):
y ~ 0 + x y ~ -1 + x y ~ x - 1
In the specific case you mention ( y ~ 1 ), y is being predicted by no other variable so the natural prediction is the mean of y, as Paul Hiemstra stated:
> data(city) > r <- lm(x~1, data=city) > r Call: lm(formula = x ~ 1, data = city) Coefficients: (Intercept) 97.3 > mean(city$x) [1] 97.3
And removing the intercept with a -1
leaves you with nothing:
> r <- lm(x ~ -1, data=city) > r Call: lm(formula = x ~ -1, data = city) No coefficients
formula()
is a function for extracting formula out of objects and its help file isn't the best place to read about specifying model formulae in R. I suggest you look at this explanation or Chapter 11 of An Introduction to R.
if your model were of the form y ~ x1 + x2
This (roughly speaking) represents:
y = β0 + β1(x1) + β2(x2) Which is of course the same as y = β0(1) + β1(x1) + β2(x2)
There is an implicit +1
in the above formula. So really, the formula above is y ~ 1 + x1 + x2
We could have a very simple formula, whereby y is not dependent on any other variable. This is the formula that you are referencing, y ~ 1
which roughly would equate to
y = β0(1) = β0
As @Paul points out, when you solve the simple model, you get β0 = mean (y)
# Let's make a small sample data frame dat <- data.frame(y= (-2):3, x=3:8) # Create the linear model as above simpleModel <- lm(y ~ 1, data=dat) ## COMPARE THE COEFFICIENTS OF THE MODEL TO THE MEAN(y) simpleModel$coef # (Intercept) # 0.5 mean(dat$y) # [1] 0.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With