Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr mutate rowSums calculations or custom functions

Tags:

r

dplyr

I'm trying to mutate a new variable from sort of row calculation, say rowSums as below

iris %>%    mutate_(sumVar =              iris %>%              select(Sepal.Length:Petal.Width) %>%             rowSums) 

the result is that "sumVar" is truncated to its first value(10.2):

Source: local data frame [150 x 6] Groups: <by row>     Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar 1           5.1         3.5          1.4         0.2  setosa   10.2 2           4.9         3.0          1.4         0.2  setosa   10.2 3           4.7         3.2          1.3         0.2  setosa   10.2 4           4.6         3.1          1.5         0.2  setosa   10.2 5           5.0         3.6          1.4         0.2  setosa   10.2 6           5.4         3.9          1.7         0.4  setosa   10.2 .. Warning message: Truncating vector to length 1  

Should it be rowwise applied? Or what's the right verb to use in these kind of calculations.

Edit:

More specifically, is there any way to realize the inline custom function with dplyr?

I'm wondering if it is possible do something like:

iris %>%    mutate(sumVar = colsum_function(Sepal.Length:Petal.Width)) 
like image 679
leoluyi Avatar asked Dec 08 '14 09:12

leoluyi


2 Answers

This is more of a workaround but could be used

iris %>% mutate(sumVar = rowSums(.[1:4])) 

As written in comments, you can also use a select inside of mutate to get the columns you want to sum up, for example

iris %>%    mutate(sumVar = rowSums(select(., contains("Sepal")))) %>%    head  

or

iris %>%    mutate(sumVar = select(., contains("Sepal")) %>% rowSums()) %>%    head 
like image 51
talat Avatar answered Sep 28 '22 05:09

talat


You can use rowwise() function:

iris %>%    rowwise() %>%    mutate(sumVar = sum(c_across(Sepal.Length:Petal.Width)))  #> # A tibble: 150 x 6 #> # Rowwise:  #>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar #>           <dbl>       <dbl>        <dbl>       <dbl> <fct>    <dbl> #>  1          5.1         3.5          1.4         0.2 setosa    10.2 #>  2          4.9         3            1.4         0.2 setosa     9.5 #>  3          4.7         3.2          1.3         0.2 setosa     9.4 #>  4          4.6         3.1          1.5         0.2 setosa     9.4 #>  5          5           3.6          1.4         0.2 setosa    10.2 #>  6          5.4         3.9          1.7         0.4 setosa    11.4 #>  7          4.6         3.4          1.4         0.3 setosa     9.7 #>  8          5           3.4          1.5         0.2 setosa    10.1 #>  9          4.4         2.9          1.4         0.2 setosa     8.9 #> 10          4.9         3.1          1.5         0.1 setosa     9.6 #> # ... with 140 more rows 

"c_across() uses tidy selection syntax so you can to succinctly select many variables"'

Finally, if you want, you can use %>% ungroup at the end to exit from rowwise.

like image 32
HBat Avatar answered Sep 28 '22 05:09

HBat