Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using 'mutate_' to sum a bunch of columns row-wise

Tags:

r

dplyr

In this blog post, Paul Hiemstra shows how to sum up two columns using dplyr::mutate_. Copy/paste-ing relevant parts:

library(lazyeval)
f = function(col1, col2, new_col_name) {
    mutate_call = lazyeval::interp(~ a + b, a = as.name(col1), b = as.name(col2))
    mtcars %>% mutate_(.dots = setNames(list(mutate_call), new_col_name))
}

allows one to then do:

head(f('wt', 'mpg', 'hahaaa'))

Great!

I followed up with a question (see comments) as to how one could extend this to a 100 columns, since it wasn't quite clear (to me) how one could do it without having to type all the names using the above method. Paul was kind enough to indulge me and provided this answer (thanks!):

# data
df = data.frame(matrix(1:100, 10, 10))
names(df) = LETTERS[1:10]

# answer
sum_all_rows = function(list_of_cols) {
  summarise_calls = sapply(list_of_cols, function(col) {
    lazyeval::interp(~col_name, col_name = as.name(col))
  })
  df %>% select_(.dots = summarise_calls) %>% mutate(ans1 = rowSums(.))
}
sum_all_rows(LETTERS[sample(1:10, 5)])

I'd like to improve this answer on these points:

  1. The other columns are gone. I'd like to keep them.

  2. It uses rowSums() which has to coerce the data.frame to a matrix which I'd like to avoid.

    Also I'm not sure if the use of . within non-do() verbs is encouraged? Because . within mutate() doesn't seem to adapt to just those rows when used with group_by().

  3. And most importantly, how can I do the same using mutate_() instead of mutate()?

I found this answer, which addresses point 1, but unfortunately, both dplyr answers use rowSums() along with mutate().


PS: I just read Hadley's comment under that answer. IIUC, 'reshape to long form + group by + sum + reshape to wide form' is the recommend dplyr way for these type of operations?

like image 494
Arun Avatar asked Sep 28 '15 14:09

Arun


1 Answers

Here's a different approach:

library(dplyr); library(lazyeval)
f <- function(df, list_of_cols, new_col) {
  df %>% 
    mutate_(.dots = ~Reduce(`+`, .[list_of_cols])) %>% 
    setNames(c(names(df), new_col))
}

head(f(mtcars, c("mpg", "cyl"), "x"))
#   mpg cyl disp  hp drat    wt  qsec vs am gear carb    x
#1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 27.0
#2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 27.0
#3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 26.8
#4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 27.4
#5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 26.7
#6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 24.1

Regarding your points:

  • Other columns are kept
  • It doesn't use rowSums
  • You are specifically asking for a row-wise operation here so I'm not sure (yet) how a group_by could do any harm when using . inside mutate/mutate_
  • It makes use of mutate_
like image 85
talat Avatar answered Sep 22 '22 12:09

talat