Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summarise over all columns

I have data of the following format:

gen = function () sample.int(10, replace = TRUE)
x = data.frame(A = gen(), C = gen(), G = gen(), T = gen())

I would now like to attach, to each row, the total sum of all the elements in the row (my actual function is more complex but sum illustrates the problem).

Without dplyr, I’d write

cbind(x, Sum = apply(x, 1, sum))

Resulting in:

   A C  G T Sum
1  3 1  6 9  19
2  3 4  3 3  13
3  3 1 10 5  19
4  7 2  1 6  16
…

But it seems surprisingly hard to do this with dplyr.

I’ve tried

x %>% rowwise() %>% mutate(Sum = sum(A : T))

But the result is not the sum of the columns of each row, it’s something unexpected and (to me) inexplicable.

I’ve also tried

x %>% rowwise() %>% mutate(Sum = sum(.))

But here, . is simply a placeholder for the whole x. Providing no argument does, unsurprisingly, also not work (results are all 0). Needless to say, none of these variants works without rowwise(), either.

(There isn’t really any reason to necessarily do this in dplyr, but (a) I’d like to keep my code as uniform as possible, and jumping between different APIs doesn’t help; and (b) I’m hoping to one day get automatic and free parallelisation of such commands in dplyr.)

like image 497
Konrad Rudolph Avatar asked Jan 22 '15 17:01

Konrad Rudolph


1 Answers

I once did something similar, and by that time I ended up with:

x %>%
  rowwise() %>%
  do(data.frame(., res = sum(unlist(.))))
#    A  C G  T res
# 1  3  2 8  6  19
# 2  6  1 7 10  24
# 3  4  8 6  7  25
# 4  6  4 7  8  25
# 5  6 10 7  2  25
# 6  7  1 2  2  12
# 7  5  4 8  5  22
# 8  9  2 3  2  16
# 9  3  4 7  6  20
# 10 7  5 3  9  24

Perhaps your more complex function works fine without unlist, but it seems like it is necessary for sum. Because . refers to the "current group", I initially thought that . for e.g. the first row in the rowwise machinery would correspond to x[1, ], which is a list, which sum swallows happily outside do

is.list((x[1, ]))
# [1] TRUE

sum(x[1, ])
# [1] 19 

However, without unlist in do an error is generated, and I am not sure why:

x %>%
  rowwise() %>%
  do(data.frame(., res = sum(.)))
# Error in sum(.) : invalid 'type' (list) of argument
like image 197
Henrik Avatar answered Sep 30 '22 04:09

Henrik