Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Easy way of "adding" two large tables in R?

Tags:

r

aggregate

Is there an easy way of adding the counts for each category in two large tables in R?

... where the tables don't all have exactly the same values present (though they will mostly overlap):

Small example of what I am trying to do. Set up some data:

  x1 <- c(5, 3, 3, 6, 3, 3, 5, 5, 11, 2, 4, 9, 3, 5, 8, 2, 8, 5, 4, 8)
  x2 <- c(6, 10, 9, 17, 6, 7, 8, 11, 5, 12, 14, 5, 11, 7, 7)

  table(x1)
x1
 2  3  4  5  6  8  9 11 
 2  5  2  5  1  3  1  1 

 table(x2)
x2
 5  6  7  8  9 10 11 12 14 17 
 2  2  3  1  1  1  2  1  1  1 

Now I want to combine these tables as if I had done table(c(x1,x2)), getting:

 2  3  4  5  6  7  8  9 10 11 12 14 17 
 2  5  2  7  3  3  4  2  1  3  1  1  1 

But now imagine x1 and x2 are gone (and are really large so I really don't want to recreate them from the tables and actually do table(c(x1,x2))), all I want is to take the tables t1 and t2 and add their (often very large) counts... which I can do in several really clunky ways.

However, this seems like it should be both very common and very easily-solved problem (indeed, I reckon that t1 + t2 ought to work for tables with categories of the same type) but searching for questions on every search term I could think of didn't find anything.

Have I missed a really simple and obvious way to do this?

Edit:

To clarify, something like this (which I did) is not 'simple and obvious' for what must be a very common operation with tables:

 m <- merge(t1,t2,by.x="x1",by.y="x2",all=TRUE)
 m[is.na(m)] <- 0
 oo <- order(m$x1)
 t12 <- m[oo,2]+m[oo,3]
 names(t12) <- m[oo,1]

In particular this is really no simpler nor easier to follow than the brute force approach.

like image 422
Glen_b Avatar asked May 07 '13 09:05

Glen_b


2 Answers

Another way using tapply:

tapply(c(t1,t2), names(c(t1,t2)), sum)
# 10 11 12 14 17  2  3  4  5  6  7  8  9 
#  1  3  1  1  1  2  5  2  7  3  3  4  2 

Here's if you want a sorted output:

w <- c(t1,t2)
# edit: Following G.Grothendieck's suggestion to simplify it further
tapply(w, as.numeric(names(w)), sum)
#  2  3  4  5  6  7  8  9 10 11 12 14 17 
#  2  5  2  7  3  3  4  2  1  3  1  1  1 
like image 142
Arun Avatar answered Sep 28 '22 06:09

Arun


As @PaulHiemstra said, mergeshould do the job. I am not too familiar with it, but this code should work (though there might be more efficient ways to do it...)

x1 <- c(5, 3, 3, 6, 3, 3, 5, 5, 11, 2, 4, 9, 3, 5, 8, 2, 8, 5, 4, 8)
x2 <- c(6, 10, 9, 17, 6, 7, 8, 11, 5, 12, 14, 5, 11, 7, 7)

tx1 <- table(x1)
tx2 <- table(x2)

df1 <- data.frame(names=names(tx1),values=as.vector(tx1))
df2 <- data.frame(names=names(tx2),values=as.vector(tx2))

mdf12 <- merge(df1,df2,by="names",all=TRUE)
mdf12[is.na(mdf12)] <- 0

counts <- mdf12[,2] + mdf12[,3]
names(counts) <- mdf12[,1]

counts[order(as.numeric(names(counts)))]
table(c(x1,x2))

I don't like the is.na step, but I do not know how to make it, that there are 0in the first place instead of NA.

like image 41
Daniel Fischer Avatar answered Sep 28 '22 04:09

Daniel Fischer