Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I sum the values of columns in several tables if tables have different lengths?

Tags:

r

Alright, this should be an easy one but I'm looking for a solution that's as fast as possible.

Let's say I have 3 tables (the number of tables will be much larger):

tab1 <- table(c(1, 1, 1, 2, 2, 3, 3, 3))
tab2 <- table(c(1, 1, 4, 4, 4))
tab3 <- table(c(1, 1, 2, 3, 5))

This is what we get:

> tab1
1 2 3 
3 2 3 
> tab2
1 4 
2 3 
> tab3
1 2 3 5 
2 1 1 1 

What I want to have in a fast way so that it works with many big tables is this:

1 2 3 4 5
7 3 4 3 1

So, basically the tables get aggregated over all names. Is there an elementary function that does this which I am missing? Thanks for your help!

like image 745
swolf Avatar asked Jun 17 '15 12:06

swolf


People also ask

How do I sum a column in a table?

Click the table cell where you want your result to appear. On the Layout tab (under Table Tools), click Formula. In the Formula box, check the text between the parentheses to make sure Word includes the cells you want to sum, and click OK. =SUM(ABOVE) adds the numbers in the column above the cell you're in.


Video Answer


3 Answers

We concatenate (c) the tab output to create 'v1', use tapply to get the sum of the elements grouped by the names of that object.

v1 <- c(tab1, tab2, tab3)
tapply(v1, names(v1), FUN=sum)
#1 2 3 4 5 
#7 3 4 3 1 
like image 64
akrun Avatar answered Oct 20 '22 21:10

akrun


You could use rowsum(). The output will be slightly different than what you show, but you can always restructure it after the calculations. rowsum() is known to be very efficient.

x <- c(tab1, tab2, tab3)
rowsum(x, names(x))
#   [,1]
# 1    7
# 2    3
# 3    4
# 4    3
# 5    1

Here's a benchmark with akrun's data.table suggestion added in as well.

library(microbenchmark)
library(data.table)

xx <- rep(x, 1e5)

microbenchmark(
    tapply = tapply(xx, names(xx), FUN=sum),
    rowsum = rowsum(xx, names(xx)),
    data.table = data.table(xx, names(xx))[, sum(xx), by = V2]
)
# Unit: milliseconds
#        expr       min        lq      mean    median        uq       max neval
#      tapply 150.47532 154.80200 176.22410 159.02577 204.22043 233.34346   100
#      rowsum  41.28635  41.65162  51.85777  43.33885  45.43370 109.91777   100
#  data.table  21.39438  24.73580  35.53500  27.56778  31.93182  92.74386   100
like image 39
Rich Scriven Avatar answered Oct 20 '22 20:10

Rich Scriven


you can try this

df <- rbind(as.matrix(tab1), as.matrix(tab2), as.matrix(tab3))
aggregate(df, by=list(row.names(df)), FUN=sum)
  Group.1 V1
1       1  7
2       2  3
3       3  4
4       4  3
5       5  1
like image 22
Mamoun Benghezal Avatar answered Oct 20 '22 20:10

Mamoun Benghezal