Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do rowSums over many columns in ``dplyr`` or ``tidyr``?

Tags:

r

dplyr

tidyr

For example, is it possible to do this in dplyr:

new_name <- "Sepal.Sum"
col_grep <- "Sepal"

iris <- cbind(iris, tmp_name = rowSums(iris[,grep(col_grep, names(iris))]))
names(iris)[names(iris) == "tmp_name"] <- new_name

This adds up all the columns that contain "Sepal" in the name and creates a new variable named "Sepal.Sum".

Importantly, the solution needs to rely on a grep (or dplyr:::matches, dplyr:::one_of, etc.) when selecting the columns for the rowSums function, and have the name of the new column be dynamic.

My application has many new columns being created in a loop, so an even better solution would use mutate_each_ to generate many of these new columns.

like image 453
kennyB Avatar asked Dec 19 '22 01:12

kennyB


1 Answers

Here a dplyr solution that uses the contains special functions to be used inside select.

 iris %>% mutate(Sepal.Sum = iris %>% rowwise() %>% select(contains("Sepal")) %>% rowSums()) -> iris2
 head(iris2)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Sum
1          5.1         3.5          1.4         0.2  setosa       8.6
2          4.9         3.0          1.4         0.2  setosa       7.9
3          4.7         3.2          1.3         0.2  setosa       7.9
4          4.6         3.1          1.5         0.2  setosa       7.7
5          5.0         3.6          1.4         0.2  setosa       8.6
6          5.4         3.9          1.7         0.4  setosa       9.3

and here the benchmarks:

Unit: milliseconds
                                                                                                      expr
 iris2 <- iris %>% mutate(Sepal.Sum = iris %>% rowwise() %>% select(contains("Sepal")) %>%      rowSums())
      min      lq     mean   median       uq      max neval
 1.816496 1.86304 2.132217 1.928748 2.509996 5.252626   100
like image 176
SabDeM Avatar answered Jan 11 '23 08:01

SabDeM