Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to subset a range of values in lm()

Tags:

r

subset

The help file for lm() doesn't go into the syntax for the subset argument. I am not sure how to get it to find the line of best fit for only a portion of my data set. This question is similar, but I wasn't able to solve my particular problem using it. How does the subset argument work in the lm() function?

Here is my code:

    with(dat[dat$SIZE <7 & dat$SIZE > 0.8 ,], plot(SP.RICH~SIZE, log="x",
      xlim=c(1,9), ylim=c(60,180), ylab="plant species richness", 
      xlab="log area (ha)", type="n"))
   with(dat[dat$SIZE <7 & dat$SIZE > 0.8 ,], points(SP.RICH~SIZE, pch=20, cex=1))
   fit=lm(SP.RICH~SIZE, subset=c(1:7))

I would like to make sure that the regression line is drawn only for the values that I subset above in the plot() and points() commands.

like image 210
eyerah Avatar asked Oct 13 '15 22:10

eyerah


People also ask

What is subset in lm in R?

The subset parameter in lm() and other model fitting functions takes as its argument a logical vector the length of the dataframe, evaluated in the environment of the dataframe.

What is the general format of the lm () function?

This function uses the following basic syntax: lm(formula, data, …) where: formula: The formula for the linear model (e.g. y ~ x1 + x2)

How do you analyze an lm in R?

Summary: R linear regression uses the lm() function to create a regression model given some formula, in the form of Y~X+X2. To look at the model, you use the summary() function. To analyze the residuals, you pull out the $resid variable from your new model.


2 Answers

The subset parameter in lm() and other model fitting functions takes as its argument a logical vector the length of the dataframe, evaluated in the environment of the dataframe. So, if I understand you correctly, I would use the following:

fit <- lm(SP.RICH~SIZE, data=dat, subset=(SIZE>0.8 & SIZE<7))
like image 116
tom 2 Avatar answered Oct 30 '22 20:10

tom 2


But the above solution does not help if you want to run one lm for each group in your data - lets say that you have different countries as a column and you want to understand the relationship between richness and size within each country.

For that I recommend following the help for the function by in R http://astrostatistics.psu.edu/su07/R/html/base/html/by.html:

require(stats)
attach(warpbreaks)
by(warpbreaks[, 1:2], tension, summary)
by(warpbreaks[, 1], list(wool = wool, tension = tension), summary)
by(warpbreaks, tension, function(x) lm(breaks ~ wool, data = x))

## now suppose we want to extract the coefficients by group
tmp <- by(warpbreaks, tension, function(x) lm(breaks ~ wool, data = x))
sapply(tmp, coef)

From the list tmp you can extract any lm parameters you like.

like image 42
AEM Avatar answered Oct 30 '22 18:10

AEM