I was reading the documentation on R Formula, and trying to figure out how to work with depmix (from the depmixS4 package). Now, in the documentation of depmixS4, sample formula tends to be something like <code>y ~ 1</code>. For simple case like <code>y ~ x</code>, it is defining a relationship between input x and output y, so I get that it is similar to <code>y = a * x + b</code>, where <code>a</code> is the slope, and <code>b</code> is the intercept. If we go back to <code>y ~ 1</code>, the formula is throwing me off. Is it equivalent to <code>y = 1</code> (a horizontal line at y = 1)? To add a bit context, if you look at the depmixs4 documentation, there is one example below <code>depmix(list(rt~1,corr~1),data=speed,nstates=2,family=list(gaussian(),multinomial()))</code> I think in general, formula that end with <code>~ 1</code> is confusing to me. Can any explain what <code>~ 1</code> or <code>y ~ 1</code> mean?

if your model were of the form <code>y ~ x1 + x2</code> This (roughly speaking) represents: <pre class="prettyprint"><code> y = β0 + β1(x1) + β2(x2) Which is of course the same as y = β0(1) + β1(x1) + β2(x2) </code></pre> There is an implicit <code>+1</code> in the above formula. So really, the formula above is <code>y ~ 1 + x1 + x2</code> We could have a very simple formula, whereby y is not dependent on any other variable. This is the formula that you are referencing, <code>y ~ 1</code> which roughly would equate to <pre class="prettyprint"><code> y = β0(1) = β0 </code></pre> As @Paul points out, when you solve the simple model, you get <code>β0 = mean (y)</code> <hr> <h3>Here is an example</h3> <pre class="prettyprint"><code> # Let's make a small sample data frame dat <- data.frame(y= (-2):3, x=3:8) # Create the linear model as above simpleModel <- lm(y ~ 1, data=dat) ## COMPARE THE COEFFICIENTS OF THE MODEL TO THE MEAN(y) simpleModel$coef # (Intercept) # 0.5 mean(dat$y) # [1] 0.5 </code></pre>

What does the R formula y~1 mean?

Tags:

r

r-formula

I was reading the documentation on R Formula, and trying to figure out how to work with depmix (from the depmixS4 package).

Now, in the documentation of depmixS4, sample formula tends to be something like y ~ 1. For simple case like y ~ x, it is defining a relationship between input x and output y, so I get that it is similar to y = a * x + b, where a is the slope, and b is the intercept.

If we go back to y ~ 1, the formula is throwing me off. Is it equivalent to y = 1 (a horizontal line at y = 1)?

To add a bit context, if you look at the depmixs4 documentation, there is one example below

depmix(list(rt~1,corr~1),data=speed,nstates=2,family=list(gaussian(),multinomial()))

I think in general, formula that end with ~ 1 is confusing to me. Can any explain what ~ 1 or y ~ 1 mean?

749

asked Nov 13 '12 18:11

Antony

2 Answers

Many of the operators used in model formulae (asterix, plus, caret) in R, have a model-specific meaning and this is one of them: the 'one' symbol indicates an intercept.

In other words, it is the value the dependent variable is expected to have when the independent variables are zero or have no influence. (To use the more common mathematical meaning of model terms, you wrap them in I()). Intercepts are usually assumed so it is most common to see it in the context of explicitly stating a model without an intercept.

Here are two ways of specifying the same model for a linear regression model of y on x. The first has an implicit intercept term, and the second an explicit one:

y ~ x y ~ 1 + x

Here are ways to give a linear regression of y on x through the origin (that is, without an intercept term):

y ~ 0 + x y ~ -1 + x y ~ x - 1

In the specific case you mention ( y ~ 1 ), y is being predicted by no other variable so the natural prediction is the mean of y, as Paul Hiemstra stated:

> data(city) > r <- lm(x~1, data=city) > r  Call: lm(formula = x ~ 1, data = city)  Coefficients: (Intercept)          97.3    > mean(city$x) [1] 97.3

And removing the intercept with a -1 leaves you with nothing:

> r <- lm(x ~ -1, data=city) > r  Call: lm(formula = x ~ -1, data = city)  No coefficients

formula() is a function for extracting formula out of objects and its help file isn't the best place to read about specifying model formulae in R. I suggest you look at this explanation or Chapter 11 of An Introduction to R.

133

answered Oct 20 '22 10:10

MattBagg

if your model were of the form y ~ x1 + x2 This (roughly speaking) represents:

 y = β0 + β1(x1) + β2(x2)   Which is of course the same as   y = β0(1) + β1(x1) + β2(x2)

There is an implicit +1 in the above formula. So really, the formula above is y ~ 1 + x1 + x2

We could have a very simple formula, whereby y is not dependent on any other variable. This is the formula that you are referencing, y ~ 1 which roughly would equate to

 y = β0(1) = β0

As @Paul points out, when you solve the simple model, you get β0 = mean (y)

Here is an example

  # Let's make a small sample data frame   dat <- data.frame(y= (-2):3, x=3:8)    # Create the linear model as above   simpleModel <- lm(y ~ 1, data=dat)    ## COMPARE THE COEFFICIENTS OF THE MODEL TO THE MEAN(y)   simpleModel$coef     # (Intercept)      #         0.5     mean(dat$y)     # [1] 0.5

answered Oct 20 '22 12:10

Ricardo Saporta

Related questions
                            
                                What are the caveats of using source versus parse & eval?
                            
                                FAQ markup to R data structure
                            
                                Why is the diag function so slow? [in R 3.2.0 or earlier]
                            
                                Confused by ...()?
                            
                                R Error: java.lang.OutOfMemoryError: Java heap space
                            
                                Options for deploying R models in production
                            
                                What does the @ symbol mean in R?
                            
                                Normalizing y-axis in histograms in R ggplot to proportion by group
                            
                                List comprehension in R
                            
                                grid.arrange from gridExtras exiting with "only 'grobs' allowed in 'gList'" after update
                            
                                Summarizing count and conditional aggregate functions on the same factor
                            
                                R displays numbers in scientific notation [duplicate]
                            
                                When trying to replace values, "missing values are not allowed in subscripted assignments of data frames"
                            
                                Return elements of list as independent objects in global environment
                            
                                How to add a title to a ggplot when the title is a variable name?
                            
                                R suppress startupMessages from dependency
                            
                                R ggplot2 legend inside the figure
                            
                                Equivalent of Paste R to Python
                            
                                What does the error "object not interpretable as a factor" mean? [closed]
                            
                                How do I run a high pass or low pass filter on data points in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With