I want to use dplyr to group a data.frame, fit linear regressions and save the residuals as a column in the original, ungrouped data.frame. Here's an example <pre class="prettyprint"><code>> iris %>% select(Sepal.Length, Sepal.Width) %>% group_by(Species) %>% do(mod = lm(Sepal.Length ~ Sepal.Width, data=.)) %>% </code></pre> Returns: <pre class="prettyprint"><code> Species mod 1 setosa <S3:lm> 2 versicolor <S3:lm> 3 virginica <S3:lm> </code></pre> Instead, I would like the original data.frame with a new column containing residuals. For example, <pre class="prettyprint"><code> Sepal.Length Sepal.Width resid 1 5.1 3.5 0.04428474 2 4.9 3.0 0.18952960 3 4.7 3.2 -0.14856834 4 4.6 3.1 -0.17951937 5 5.0 3.6 -0.12476423 6 5.4 3.9 0.06808885 </code></pre>

I adapted an example from http://jimhester.github.io/plyrToDplyr/. <pre class="prettyprint"><code>r <- iris %>% group_by(Species) %>% do(model = lm(Sepal.Length ~ Sepal.Width, data=.)) %>% do((function(mod) { data.frame(resid = residuals(mod$model)) })(.)) corrected <- cbind(iris, r) </code></pre> update Another method is to use the <code>augment</code> function in the broom package: <pre class="prettyprint"><code>r <- iris %>% group_by(Species) %>% do(augment(lm(Sepal.Length ~ Sepal.Width, data=.)) </code></pre> Which returns: <pre class="prettyprint"><code>Source: local data frame [150 x 10] Groups: Species Species Sepal.Length Sepal.Width .fitted .se.fit .resid .hat 1 setosa 5.1 3.5 5.055715 0.03435031 0.04428474 0.02073628 2 setosa 4.9 3.0 4.710470 0.05117134 0.18952960 0.04601750 3 setosa 4.7 3.2 4.848568 0.03947370 -0.14856834 0.02738325 4 setosa 4.6 3.1 4.779519 0.04480537 -0.17951937 0.03528008 5 setosa 5.0 3.6 5.124764 0.03710984 -0.12476423 0.02420180 ... </code></pre>

save residuals with `dplyr`

Tags:

r

dplyr

I want to use dplyr to group a data.frame, fit linear regressions and save the residuals as a column in the original, ungrouped data.frame.

Here's an example

> iris %>%
   select(Sepal.Length, Sepal.Width) %>%
   group_by(Species) %>%
   do(mod = lm(Sepal.Length ~ Sepal.Width, data=.)) %>%

Returns:

     Species     mod
1     setosa <S3:lm>
2 versicolor <S3:lm>
3  virginica <S3:lm>

Instead, I would like the original data.frame with a new column containing residuals.

For example,

    Sepal.Length Sepal.Width  resid
1   5.1         3.5  0.04428474
2   4.9         3.0  0.18952960
3   4.7         3.2 -0.14856834
4   4.6         3.1 -0.17951937
5   5.0         3.6 -0.12476423
6   5.4         3.9  0.06808885

432

asked Dec 12 '14 21:12

Austin Richardson

2 Answers

I adapted an example from http://jimhester.github.io/plyrToDplyr/.

r <- iris %>%
  group_by(Species) %>%
  do(model = lm(Sepal.Length ~ Sepal.Width, data=.)) %>%
  do((function(mod) {
     data.frame(resid = residuals(mod$model))
  })(.))

corrected <- cbind(iris, r)

update Another method is to use the augment function in the broom package:

r <- iris %>%
  group_by(Species) %>%
  do(augment(lm(Sepal.Length ~ Sepal.Width, data=.))

Which returns:

Source: local data frame [150 x 10]
Groups: Species

   Species Sepal.Length Sepal.Width  .fitted    .se.fit      .resid       .hat
1   setosa          5.1         3.5 5.055715 0.03435031  0.04428474 0.02073628
2   setosa          4.9         3.0 4.710470 0.05117134  0.18952960 0.04601750
3   setosa          4.7         3.2 4.848568 0.03947370 -0.14856834 0.02738325
4   setosa          4.6         3.1 4.779519 0.04480537 -0.17951937 0.03528008
5   setosa          5.0         3.6 5.124764 0.03710984 -0.12476423 0.02420180
...

102

answered Sep 18 '22 23:09

Austin Richardson

A solution that seems to be easier than the ones proposed so far and closer to the code of the original question is :

iris %>%
   group_by(Species) %>%
   do(data.frame(., resid = residuals(lm(Sepal.Length ~ Sepal.Width, data=.))))

Result :

# A tibble: 150 x 6
# Groups:   Species [3]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species   resid
          <dbl>       <dbl>        <dbl>       <dbl> <fct>     <dbl>
 1          5.1         3.5          1.4         0.2 setosa   0.0443
 2          4.9         3            1.4         0.2 setosa   0.190 
 3          4.7         3.2          1.3         0.2 setosa  -0.149 
 4          4.6         3.1          1.5         0.2 setosa  -0.180 
 5          5           3.6          1.4         0.2 setosa  -0.125 
 6          5.4         3.9          1.7         0.4 setosa   0.0681
 7          4.6         3.4          1.4         0.3 setosa  -0.387 
 8          5           3.4          1.5         0.2 setosa   0.0133
 9          4.4         2.9          1.4         0.2 setosa  -0.241 
10          4.9         3.1          1.5         0.1 setosa   0.120

answered Sep 20 '22 23:09

Gilles

Related questions
                            
                                Comparing two lists [R]
                            
                                R - how to use rpart?
                            
                                Fill area above and below horizontal lines in plot
                            
                                Fill density curves with transparent color
                            
                                How to get R list column names from C code
                            
                                R CMD build skips knitr/Rmd vignettes - "Output(s) listed in 'build/vignette.rds' but not in package"
                            
                                Write Regression summary to the csv file in R
                            
                                How to strsplit data frame column and replicate rows accordingly? [duplicate]
                            
                                Obtain spline surface on R
                            
                                using rmarkdown as a vignette engine
                            
                                Time series as `ts` column in data.table?
                            
                                How to tweak the extent to which an axis is drawn in ggplot2? [duplicate]
                            
                                Extract multiple substrings from a single
                            
                                predict() with arbitrary coefficients in r
                            
                                add more argument to summarise in dplyr
                            
                                Return pmin or pmax of data.frame with multiple columns
                            
                                R: Adding NAs into Data Frame
                            
                                Adapt pandoc.table column width
                            
                                Removing Only Adjacent Duplicates in Data Frame in R
                            
                                reordering rows in a dataframe according to the order of rows in another dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

save residuals with `dplyr`

Tags:

r

dplyr

Austin Richardson

People also ask

2 Answers

Austin Richardson

Gilles

Recent Activity

Donate For Us