The help file for lm() doesn't go into the syntax for the subset argument. I am not sure how to get it to find the line of best fit for only a portion of my data set. This question is similar, but I wasn't able to solve my particular problem using it. How does the subset argument work in the lm() function? Here is my code: <pre class="prettyprint"><code> with(dat[dat$SIZE <7 & dat$SIZE > 0.8 ,], plot(SP.RICH~SIZE, log="x", xlim=c(1,9), ylim=c(60,180), ylab="plant species richness", xlab="log area (ha)", type="n")) with(dat[dat$SIZE <7 & dat$SIZE > 0.8 ,], points(SP.RICH~SIZE, pch=20, cex=1)) fit=lm(SP.RICH~SIZE, subset=c(1:7)) </code></pre> I would like to make sure that the regression line is drawn only for the values that I subset above in the plot() and points() commands.

But the above solution does not help if you want to run one lm for each group in your data - lets say that you have different countries as a column and you want to understand the relationship between richness and size within each country. For that I recommend following the help for the function by in R http://astrostatistics.psu.edu/su07/R/html/base/html/by.html: <pre class="prettyprint"><code>require(stats) attach(warpbreaks) by(warpbreaks[, 1:2], tension, summary) by(warpbreaks[, 1], list(wool = wool, tension = tension), summary) by(warpbreaks, tension, function(x) lm(breaks ~ wool, data = x)) ## now suppose we want to extract the coefficients by group tmp <- by(warpbreaks, tension, function(x) lm(breaks ~ wool, data = x)) sapply(tmp, coef) </code></pre> From the list <code>tmp</code> you can extract any lm parameters you like.

How to subset a range of values in lm()

Tags:

r

subset

The help file for lm() doesn't go into the syntax for the subset argument. I am not sure how to get it to find the line of best fit for only a portion of my data set. This question is similar, but I wasn't able to solve my particular problem using it. How does the subset argument work in the lm() function?

Here is my code:

    with(dat[dat$SIZE <7 & dat$SIZE > 0.8 ,], plot(SP.RICH~SIZE, log="x",
      xlim=c(1,9), ylim=c(60,180), ylab="plant species richness", 
      xlab="log area (ha)", type="n"))
   with(dat[dat$SIZE <7 & dat$SIZE > 0.8 ,], points(SP.RICH~SIZE, pch=20, cex=1))
   fit=lm(SP.RICH~SIZE, subset=c(1:7))

I would like to make sure that the regression line is drawn only for the values that I subset above in the plot() and points() commands.

210

asked Oct 13 '15 22:10

eyerah

2 Answers

The subset parameter in lm() and other model fitting functions takes as its argument a logical vector the length of the dataframe, evaluated in the environment of the dataframe. So, if I understand you correctly, I would use the following:

fit <- lm(SP.RICH~SIZE, data=dat, subset=(SIZE>0.8 & SIZE<7))

116

answered Oct 30 '22 20:10

tom 2

But the above solution does not help if you want to run one lm for each group in your data - lets say that you have different countries as a column and you want to understand the relationship between richness and size within each country.

For that I recommend following the help for the function by in R http://astrostatistics.psu.edu/su07/R/html/base/html/by.html:

require(stats)
attach(warpbreaks)
by(warpbreaks[, 1:2], tension, summary)
by(warpbreaks[, 1], list(wool = wool, tension = tension), summary)
by(warpbreaks, tension, function(x) lm(breaks ~ wool, data = x))

## now suppose we want to extract the coefficients by group
tmp <- by(warpbreaks, tension, function(x) lm(breaks ~ wool, data = x))
sapply(tmp, coef)

From the list tmp you can extract any lm parameters you like.

answered Oct 30 '22 18:10

AEM

Related questions
                            
                                plot with ggplot in for-loop doesn't work [duplicate]
                            
                                Friedman test unreplicated complete block design error
                            
                                Creating a list of random vectors in R
                            
                                Efficient conversion to vectors in R
                            
                                Changing the alpha values in R{graphics} while the colour argument is used
                            
                                Extract string elements that possibly appear multiple times, or not at all
                            
                                substitute in r together with anova
                            
                                Generate vector of a repeated string with incremental suffix number
                            
                                Restructuring team- to individual-level data in R (while retaining team-level information)
                            
                                r ggplot2: varying font sizes in legend
                            
                                Splitting scientific names [closed]
                            
                                Get row and column name of data.frame according to condition in R
                            
                                View() an entire dataset in RStudio (past 1000 row limit)
                            
                                Rolling means and applying means at beginning of a series of data
                            
                                R: Pivot the rows into columns and use N/A's for missing values
                            
                                Are For loops evil in R?
                            
                                R indexing string with character blocks denoting nucleotide variants
                            
                                Fastest way to categorize integer data
                            
                                installing Rcartogram packages - error message
                            
                                ggplotly - R, labeling trace names

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With