Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sum specific elements of a list

Tags:

list

r

a<-list(5,6,8,4,5,2)
b<-c(3,2,1)

I want to sum the "a" according to "b" to form a new list.

that is (5+6+8),(4+5),2

The expected result is:

[[1]]
[1] 19

[[2]]
[1] 9

[[3]]
[1] 2

I use the following code to work out, but I wonder whether there is more convenient way to solve this problem. Thank you!

p<-rep(1:length(b),b)
as.list(sapply(1:length(b), function(x) {sum(as.numeric(a)[which(p==x)])}))
like image 350
lightsnail Avatar asked Dec 11 '22 17:12

lightsnail


2 Answers

I've thought of an interesting solution to this problem, which is perhaps a little strange, but I like it:

as.list(diff(c(0,cumsum(a)[cumsum(b)])));
## [[1]]
## [1] 19
##
## [[2]]
## [1] 9
##
## [[3]]
## [1] 2
##

Explanation


First we take a complete cumulative sum with cumsum(). Note: I originally thought cumsum() required an atomic vector (like sum(), for example), and thus I initially had a call to unlist() prior to cumsum(), but thanks to @thelatemail for pointing out that it can work with lists as well!

cumsum(a);
## [1]  5 11 19 23 28 30

Then the endpoints of the ranges to be summed are extracted by indexing on cumsum(b):

cumsum(b);
## [1] 3 5 6
cumsum(a)[cumsum(b)];
## [1] 19 28 30

We can produce the required summations by taking diff() with a leading zero:

diff(c(0,cumsum(a)[cumsum(b)]));
## [1] 19  9  2

And since you want the result as a list, we finally need a call to as.list():

as.list(diff(c(0,cumsum(a)[cumsum(b)])));
## [[1]]
## [1] 19
##
## [[2]]
## [1] 9
##
## [[3]]
## [1] 2
##

Performance


lightsnail <- function() { p<-rep(1:length(b),b); as.list(sapply(1:length(b), function(x) {sum(as.numeric(a)[which(p==x)])})); };
thelatemail <- function() as.list(tapply(unlist(a), rep(seq_along(b), b), sum)); ## added as.list()
psidom <- function() lapply(split(unlist(a), rep(seq_along(b), b)), sum);
tfc <- function() as.list(aggregate(unlist(a), list(rep(1:length(b),b)), sum)[["x"]]);
user20650 <- function() as.list(rowsum(unlist(a), rep(seq_along(b), b), reorder=FALSE));
bgoldst <- function() as.list(diff(c(0,cumsum(a)[cumsum(b)])));

expected <- list(19,9,2);
identical(expected,lightsnail());
## [1] TRUE
identical(expected,unname(thelatemail())); ## ignore names
## [1] TRUE
identical(expected,unname(psidom())); ## ignore names
## [1] TRUE
identical(expected,tfc());
## [1] TRUE
identical(expected,user20650());
## [1] TRUE
identical(expected,bgoldst());
## [1] TRUE

library(microbenchmark);
microbenchmark(lightsnail(),thelatemail(),psidom(),tfc(),user20650(),bgoldst(),times=1e3L);
## Unit: microseconds
##           expr     min      lq      mean  median      uq      max neval
##   lightsnail()  26.088  33.358  37.34079  37.206  39.344  100.927  1000
##  thelatemail() 121.881 135.139 151.77782 142.837 150.963 3547.386  1000
##       psidom()  48.753  55.595  61.13800  59.016  63.507  276.693  1000
##          tfc() 574.767 613.256 646.64302 628.652 645.757 1923.586  1000
##    user20650()  17.534  23.094  25.49522  25.232  26.943  101.782  1000
##      bgoldst()  10.264  14.969  17.61914  17.535  18.817   82.965  1000
like image 59
bgoldst Avatar answered Dec 13 '22 06:12

bgoldst


Another option: lapply(split(unlist(a), rep(seq_along(b), b)), sum)

like image 20
Psidom Avatar answered Dec 13 '22 07:12

Psidom