Is there an easy way of adding the counts for each category in two large tables in R?
... where the tables don't all have exactly the same values present (though they will mostly overlap):
Small example of what I am trying to do. Set up some data:
x1 <- c(5, 3, 3, 6, 3, 3, 5, 5, 11, 2, 4, 9, 3, 5, 8, 2, 8, 5, 4, 8)
x2 <- c(6, 10, 9, 17, 6, 7, 8, 11, 5, 12, 14, 5, 11, 7, 7)
table(x1)
x1
2 3 4 5 6 8 9 11
2 5 2 5 1 3 1 1
table(x2)
x2
5 6 7 8 9 10 11 12 14 17
2 2 3 1 1 1 2 1 1 1
Now I want to combine these tables as if I had done table(c(x1,x2))
, getting:
2 3 4 5 6 7 8 9 10 11 12 14 17
2 5 2 7 3 3 4 2 1 3 1 1 1
But now imagine x1 and x2 are gone (and are really large so I really don't
want to recreate them from the tables and actually do table(c(x1,x2))
),
all I want is to take the tables t1
and t2
and add their (often very large) counts... which I can do in several really clunky ways.
However, this seems like it should be both very common and very easily-solved
problem (indeed, I reckon that t1 + t2
ought to work for tables with categories
of the same type) but searching for questions on every search term I could think
of didn't find anything.
Have I missed a really simple and obvious way to do this?
Edit:
To clarify, something like this (which I did) is not 'simple and obvious' for what must be a very common operation with tables:
m <- merge(t1,t2,by.x="x1",by.y="x2",all=TRUE)
m[is.na(m)] <- 0
oo <- order(m$x1)
t12 <- m[oo,2]+m[oo,3]
names(t12) <- m[oo,1]
In particular this is really no simpler nor easier to follow than the brute force approach.
Another way using tapply
:
tapply(c(t1,t2), names(c(t1,t2)), sum)
# 10 11 12 14 17 2 3 4 5 6 7 8 9
# 1 3 1 1 1 2 5 2 7 3 3 4 2
Here's if you want a sorted output:
w <- c(t1,t2)
# edit: Following G.Grothendieck's suggestion to simplify it further
tapply(w, as.numeric(names(w)), sum)
# 2 3 4 5 6 7 8 9 10 11 12 14 17
# 2 5 2 7 3 3 4 2 1 3 1 1 1
As @PaulHiemstra said, merge
should do the job. I am not too familiar with it, but this code should work (though there might be more efficient ways to do it...)
x1 <- c(5, 3, 3, 6, 3, 3, 5, 5, 11, 2, 4, 9, 3, 5, 8, 2, 8, 5, 4, 8)
x2 <- c(6, 10, 9, 17, 6, 7, 8, 11, 5, 12, 14, 5, 11, 7, 7)
tx1 <- table(x1)
tx2 <- table(x2)
df1 <- data.frame(names=names(tx1),values=as.vector(tx1))
df2 <- data.frame(names=names(tx2),values=as.vector(tx2))
mdf12 <- merge(df1,df2,by="names",all=TRUE)
mdf12[is.na(mdf12)] <- 0
counts <- mdf12[,2] + mdf12[,3]
names(counts) <- mdf12[,1]
counts[order(as.numeric(names(counts)))]
table(c(x1,x2))
I don't like the is.na
step, but I do not know how to make it, that there are 0
in the first place instead of NA
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With