Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tapply on matrices of data and indices

Tags:

r

I am calculating sums of matrix columns to each group, where the corresponding group values are contained in matrix columns as well. At the moment I am using a loop as follows:

index <- matrix(c("A","A","B","B","B","B","A","A"),4,2)
x <- matrix(1:8,4,2)

for (i in 1:2) {
  tapply(x[,i], index[,i], sum)
}

At the end of the day I need the following result:

   1  2
A  3  15
B  7  11

Is there a way to do this using matrix operations without a loop? On top, the real data is large (e.g. 500 x 10000), therefore it has to be fast.

Thanks in advance.

like image 695
andmar25 Avatar asked Oct 26 '11 21:10

andmar25


3 Answers

Here are a couple of solutions:

# 1
ag <- aggregate(c(x), data.frame(index = c(index), col = c(col(x))), sum)
xt <- xtabs(x ~., ag)

# 2
m <- mapply(rowsum, as.data.frame(x), as.data.frame(index))
dimnames(m) <- list(levels(factor(index)), 1:ncol(index))

The second only works if every column of index has at least one of each level and also requires that there be at least 2 levels; however, its faster.

like image 185
G. Grothendieck Avatar answered Sep 27 '22 18:09

G. Grothendieck


This is ugly and works but there's a much better way to do it that is more generalizable. Just getting the ball rolling.

data.frame("col1"=as.numeric(table(rep(index[,1], x[,1]))),
           "col2"=as.numeric(table(rep(index[,2], x[,2]))), 
            row.names=names(table(index)))
like image 42
Tyler Rinker Avatar answered Sep 27 '22 18:09

Tyler Rinker


I still suspect there's a better option, but this seems reasonably fast actually:

index <- matrix(sample(LETTERS[1:4],size = 500*1000,replace = TRUE),500,10000)
x <- matrix(sample(1:10,500*10000,replace = TRUE),500,10000)

rs <- matrix(NA,4,10000)
rownames(rs) <- LETTERS[1:4]
for (i in LETTERS[1:4]){
    tmp <- x
    tmp[index != i] <- 0
    rs[i,] <- colSums(tmp)
}

It runs in ~0.8 seconds on my machine. I upped the number of categories to four and scaled it up to the size data you have. But I don't having to copy x each time.

You can get clever with matrix multiplication, but I think you still have to do one row or column at a time.

like image 31
joran Avatar answered Sep 27 '22 18:09

joran