For example, is it possible to do this in dplyr:
new_name <- "Sepal.Sum"
col_grep <- "Sepal"
iris <- cbind(iris, tmp_name = rowSums(iris[,grep(col_grep, names(iris))]))
names(iris)[names(iris) == "tmp_name"] <- new_name
This adds up all the columns that contain "Sepal" in the name and creates a new variable named "Sepal.Sum".
Importantly, the solution needs to rely on a grep
(or dplyr:::matches
, dplyr:::one_of
, etc.) when selecting the columns for the rowSums
function, and have the name of the new column be dynamic.
My application has many new columns being created in a loop, so an even better solution would use mutate_each_
to generate many of these new columns.
Here a dplyr
solution that uses the contains
special functions to be used inside select
.
iris %>% mutate(Sepal.Sum = iris %>% rowwise() %>% select(contains("Sepal")) %>% rowSums()) -> iris2
head(iris2)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Sum
1 5.1 3.5 1.4 0.2 setosa 8.6
2 4.9 3.0 1.4 0.2 setosa 7.9
3 4.7 3.2 1.3 0.2 setosa 7.9
4 4.6 3.1 1.5 0.2 setosa 7.7
5 5.0 3.6 1.4 0.2 setosa 8.6
6 5.4 3.9 1.7 0.4 setosa 9.3
and here the benchmarks:
Unit: milliseconds
expr
iris2 <- iris %>% mutate(Sepal.Sum = iris %>% rowwise() %>% select(contains("Sepal")) %>% rowSums())
min lq mean median uq max neval
1.816496 1.86304 2.132217 1.928748 2.509996 5.252626 100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With