I don't quite understand what the p-value in this output means. I don't mean p-values as such, but in this case. <pre class="prettyprint"><code>> Model 1: sl ~ le + ky > Model 2: sl ~ le Res.Df RSS Df Sum of Sq F Pr(>F) 1 97 0.51113 2 98 0.51211 -1 -0.00097796 0.1856 0.6676 </code></pre> I get something like that, and now I am wondering which model is the better fit. As there is only ONE and not TWO p-values I'm getting confused. I get different pvalues using summary(model1) or summary(model2) Now if <pre class="prettyprint"><code>> fm2<-lm(Y~X+T) </code></pre> (T being my indicator variable) and <pre class="prettyprint"><code>> fm4<-lm(Y~X) </code></pre> if I do <pre class="prettyprint"><code>> anova(fm2,fm4) </code></pre> this tests the null hypothesis <code>H0: alpha1==alpha2</code> <code>(Ha: alpha1!=alpha2)</code>c(alpha being my intercept) So it is tested whether it is better to have one intercept (=> <code>alpha1==alpha2</code>), or two intercepts (<code>alpha1!=alpha2</code>) In this case we would now obviously reject the null Hypotheses, as the p-value is 0.6676. This would mean we should rather stick with model <code>fm4</code>, as it is more appropriate for our data. Did I draw the conclusions right? I tried my very best, but I am not sure what the p-value means. As there is only on, this is what I thought it might mean. Can someone clear things up?

Do you mean "would not obviously reject the null hypothesis" (rather than "now obviously reject")? That would seem to make more sense given the rest of your question. There is only one p-value because there are two models to compare, hence a single comparison (null hypothesis vs alternative, or really in this case null hypothesis vs unspecified alternative). It sounds from what you have said above as though <code>le</code> is a continuous and <code>ky</code> is a categorical predictor, in which case you are comparing a model with a slope and an intercept against (as you said) a model with a single slope and two intercepts. Because the p-value is relatively large, that means that the data do not provide evidence for an additive effect of <code>ky</code>. The simpler model would generally be more appropriate (although be careful with this conclusion, as p-values are constructed to test hypotheses, not to choose among models). The p-values you get for <code>summary()</code> of each individual model are the p-values for the effects of each of the parameters in each model, conditional on all the other parameters in that model. If your data are perfectly balanced (which is unlikely in a regression design), you should get the same answers from <code>summary</code> and <code>anova</code>, but otherwise the results from <code>anova</code> are generally preferable. This question is probably more appropriate for http://stats.stackexchange.com , as it is really about statistical interpretation rather than programming ...

Comparing two linear models with anova() in R [closed]

Tags:

r

linear-regression

regression

anova

I don't quite understand what the p-value in this output means. I don't mean p-values as such, but in this case.

> Model 1: sl ~ le + ky 
> Model 2: sl ~ le   
  Res.Df     RSS Df   Sum of Sq      F Pr(>F) 
1     97 0.51113                              
2     98 0.51211 -1 -0.00097796 0.1856 0.6676

I get something like that, and now I am wondering which model is the better fit. As there is only ONE and not TWO p-values I'm getting confused. I get different pvalues using summary(model1) or summary(model2)

Now if

> fm2<-lm(Y~X+T)

(T being my indicator variable) and

> fm4<-lm(Y~X)

if I do

> anova(fm2,fm4)

this tests the null hypothesis H0: alpha1==alpha2 (Ha: alpha1!=alpha2)c(alpha being my intercept) So it is tested whether it is better to have one intercept (=> alpha1==alpha2), or two intercepts (alpha1!=alpha2)

In this case we would now obviously reject the null Hypotheses, as the p-value is 0.6676.

This would mean we should rather stick with model fm4, as it is more appropriate for our data.

Did I draw the conclusions right? I tried my very best, but I am not sure what the p-value means. As there is only on, this is what I thought it might mean. Can someone clear things up?

761

asked Oct 12 '12 16:10

lisa

1 Answers

Do you mean "would not obviously reject the null hypothesis" (rather than "now obviously reject")? That would seem to make more sense given the rest of your question.

There is only one p-value because there are two models to compare, hence a single comparison (null hypothesis vs alternative, or really in this case null hypothesis vs unspecified alternative). It sounds from what you have said above as though le is a continuous and ky is a categorical predictor, in which case you are comparing a model with a slope and an intercept against (as you said) a model with a single slope and two intercepts. Because the p-value is relatively large, that means that the data do not provide evidence for an additive effect of ky. The simpler model would generally be more appropriate (although be careful with this conclusion, as p-values are constructed to test hypotheses, not to choose among models).

The p-values you get for summary() of each individual model are the p-values for the effects of each of the parameters in each model, conditional on all the other parameters in that model. If your data are perfectly balanced (which is unlikely in a regression design), you should get the same answers from summary and anova, but otherwise the results from anova are generally preferable.

This question is probably more appropriate for http://stats.stackexchange.com , as it is really about statistical interpretation rather than programming ...

129

answered Sep 24 '22 07:09

Ben Bolker

Related questions
                            
                                Combine multiple categorical variables in one dummy variable
                            
                                R as.POSIXct(Sys.Date()) returns date a day early
                            
                                Looking for a more efficient ifelse()
                            
                                How to get the list of class that have a common S4 superclass in R
                            
                                Getting Sweave code chunks to stay inside page margins?
                            
                                subset() of a vector in R
                            
                                R: temporarily overriding functions and scope/namespace
                            
                                R plot: Using italics and a variable in a title
                            
                                ggplot2 Error in initFields
                            
                                alpha channel in ggplot2 does not work after installing 2.15
                            
                                Ordering clustered points using Kmeans and R
                            
                                load new files in directory
                            
                                Plotting axis labels with Greek symbols from a vector
                            
                                Format dates to Month-Year while keeping class Date
                            
                                Convert to from scientific notation to decimal with ggplot
                            
                                Reserved memory of R is twice the size of an allocated array
                            
                                timezones in R: how to avoid ambiguous terms such as EST?
                            
                                R - Filtering time series
                            
                                How do I rotate a ggplot to landscape?
                            
                                How to prevent execution of further code steps after error?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With