I always thought that the <code>lm</code> function was extremely fast in R, but as this example would suggest, the closed solution computed using the <code>solve</code> function is way faster. <pre class="prettyprint"><code>data<-data.frame(y=rnorm(1000),x1=rnorm(1000),x2=rnorm(1000)) X = cbind(1,data$x1,data$x2) library(microbenchmark) microbenchmark( solve(t(X) %*% X, t(X) %*% data$y), lm(y ~ .,data=data)) </code></pre> Can someone explain me if this toy example is a bad example or it is the case that <code>lm</code> is actually slow? EDIT: As suggested by Dirk Eddelbuettel, as <code>lm</code> needs to resolve the formula, the comparison is unfair, so better to use <code>lm.fit</code> which doesn't need to resolve the formula <pre class="prettyprint"><code>microbenchmark( solve(t(X) %*% X, t(X) %*% data$y), lm.fit(X,data$y)) Unit: microseconds expr min lq mean median uq max neval cld solve(t(X) %*% X, t(X) %*% data$y) 99.083 108.754 125.1398 118.0305 131.2545 236.060 100 a lm.fit(X, y) 125.136 136.978 151.4656 143.4915 156.7155 262.114 100 b </code></pre>

You are overlooking that <ul> <li> <code>solve()</code> only returns your parameters</li> <li> <code>lm()</code> returns you a (very rich) object with many components for subsequent analysis, inference, plots, ...</li> <li>the main cost of your <code>lm()</code> call is not the projection but the resolution of the formula <code>y ~ .</code> from which the model matrix needs to be built.</li> </ul> To illustrate Rcpp we wrote a few variants of a function <code>fastLm()</code> doing more of what <code>lm()</code> does (ie a bit more than <code>lm.fit()</code> from base R) and measured it. See e.g. this benchmark script which clearly shows that the dominant cost for smaller data sets is in parsing the formula and building the model matrix. In short, you are doing the Right Thing by using benchmarking but you are doing it not all that correctly in trying to compare what is mostly incomparable: a subset with a much larger task.

Why the built-in lm function is so slow in R?

Tags:

r

linear-regression

regression

lm

I always thought that the lm function was extremely fast in R, but as this example would suggest, the closed solution computed using the solve function is way faster.

data<-data.frame(y=rnorm(1000),x1=rnorm(1000),x2=rnorm(1000))
X = cbind(1,data$x1,data$x2)

library(microbenchmark)
microbenchmark(
solve(t(X) %*% X, t(X) %*% data$y),
lm(y ~ .,data=data))

Can someone explain me if this toy example is a bad example or it is the case that lm is actually slow?

EDIT: As suggested by Dirk Eddelbuettel, as lm needs to resolve the formula, the comparison is unfair, so better to use lm.fit which doesn't need to resolve the formula

microbenchmark(
solve(t(X) %*% X, t(X) %*% data$y),
lm.fit(X,data$y))


Unit: microseconds
                           expr     min      lq     mean   median       uq     max neval cld
solve(t(X) %*% X, t(X) %*% data$y)  99.083 108.754 125.1398 118.0305 131.2545 236.060   100  a 
                      lm.fit(X, y) 125.136 136.978 151.4656 143.4915 156.7155 262.114   100   b

522

asked Apr 12 '16 11:04

adaien

1 Answers

You are overlooking that

solve() only returns your parameters
lm() returns you a (very rich) object with many components for subsequent analysis, inference, plots, ...
the main cost of your lm() call is not the projection but the resolution of the formula y ~ . from which the model matrix needs to be built.

To illustrate Rcpp we wrote a few variants of a function fastLm() doing more of what lm() does (ie a bit more than lm.fit() from base R) and measured it. See e.g. this benchmark script which clearly shows that the dominant cost for smaller data sets is in parsing the formula and building the model matrix.

In short, you are doing the Right Thing by using benchmarking but you are doing it not all that correctly in trying to compare what is mostly incomparable: a subset with a much larger task.

146

answered Sep 26 '22 04:09

Dirk Eddelbuettel

Related questions
                            
                                How to get the longitude and latitude coordinates from a city name and country in R?
                            
                                ggplot2: Change legend symbol
                            
                                Updating R in Windows
                            
                                How to find list of attached data-sets in R?
                            
                                multiply two vectors - I want a scalar but I get a vector?
                            
                                Plotting data from an svm fit - hyperplane
                            
                                using caret package to find optimal parameters of GBM
                            
                                Error with knn function
                            
                                Can't connect to local MySQL server through socket error when using SSH tunel
                            
                                R: Creating Custom Shapes with ggplot
                            
                                Read csv file in R with currency column as numeric
                            
                                Generate sets for cross-validation
                            
                                String continuation across multiple lines, no newline characters
                            
                                How do I take a rolling product using data.table
                            
                                for each group summarise means for all variables in dataframe (ddply? split?)
                            
                                R-forge vs Rforge? [closed]
                            
                                A^k for matrix multiplication in R?
                            
                                download.file() in R has non zero exit status
                            
                                How to fill in the preceding numbers whenever there is a 0 in R?
                            
                                How to omit rows with NA in only two columns in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With