Logo Questions Linux Laravel Mysql Ubuntu Git Menu

dplyr mutate rowSums calculations or custom functions




I'm trying to mutate a new variable from sort of row calculation, say rowSums as below

iris %>%    mutate_(sumVar =              iris %>%              select(Sepal.Length:Petal.Width) %>%             rowSums) 

the result is that "sumVar" is truncated to its first value(10.2):

Source: local data frame [150 x 6] Groups: <by row>     Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar 1           5.1         3.5          1.4         0.2  setosa   10.2 2           4.9         3.0          1.4         0.2  setosa   10.2 3           4.7         3.2          1.3         0.2  setosa   10.2 4           4.6         3.1          1.5         0.2  setosa   10.2 5           5.0         3.6          1.4         0.2  setosa   10.2 6           5.4         3.9          1.7         0.4  setosa   10.2 .. Warning message: Truncating vector to length 1  

Should it be rowwise applied? Or what's the right verb to use in these kind of calculations.


More specifically, is there any way to realize the inline custom function with dplyr?

I'm wondering if it is possible do something like:

iris %>%    mutate(sumVar = colsum_function(Sepal.Length:Petal.Width)) 
like image 679
leoluyi Avatar asked Dec 08 '14 09:12


2 Answers

This is more of a workaround but could be used

iris %>% mutate(sumVar = rowSums(.[1:4])) 

As written in comments, you can also use a select inside of mutate to get the columns you want to sum up, for example

iris %>%    mutate(sumVar = rowSums(select(., contains("Sepal")))) %>%    head  


iris %>%    mutate(sumVar = select(., contains("Sepal")) %>% rowSums()) %>%    head 
like image 51
talat Avatar answered Sep 28 '22 05:09


You can use rowwise() function:

iris %>%    rowwise() %>%    mutate(sumVar = sum(c_across(Sepal.Length:Petal.Width)))  #> # A tibble: 150 x 6 #> # Rowwise:  #>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar #>           <dbl>       <dbl>        <dbl>       <dbl> <fct>    <dbl> #>  1          5.1         3.5          1.4         0.2 setosa    10.2 #>  2          4.9         3            1.4         0.2 setosa     9.5 #>  3          4.7         3.2          1.3         0.2 setosa     9.4 #>  4          4.6         3.1          1.5         0.2 setosa     9.4 #>  5          5           3.6          1.4         0.2 setosa    10.2 #>  6          5.4         3.9          1.7         0.4 setosa    11.4 #>  7          4.6         3.4          1.4         0.3 setosa     9.7 #>  8          5           3.4          1.5         0.2 setosa    10.1 #>  9          4.4         2.9          1.4         0.2 setosa     8.9 #> 10          4.9         3.1          1.5         0.1 setosa     9.6 #> # ... with 140 more rows 

"c_across() uses tidy selection syntax so you can to succinctly select many variables"'

Finally, if you want, you can use %>% ungroup at the end to exit from rowwise.

like image 32
HBat Avatar answered Sep 28 '22 05:09
