Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Applying a function to all row-pairs of a matrix without for loop

I want all pairwise comparisons for all rows in the matrix, obviously double for loop will work but extremely expensive for large dataset.

I looked up implicit loop like apply(), etc. but have no a clue how to avoid the inner loop.

How can it be achieved?

like image 331
rpylearning Avatar asked Jun 07 '11 17:06

rpylearning


2 Answers

I'm assuming you're trying do some type of comparison across all row-pairs of a matrix. You could use outer() to run through all pairs of row-indices, and apply a vectorized comparison function to each row-pair. E.g. you could calculate the squared Euclidean distance among all row-pairs as follows:

m <- matrix(1:12,4,3)     
> outer(1:4,1:4, FUN = Vectorize( function(i,j) sum((m[i,]-m[j,])^2 )) )
     [,1] [,2] [,3] [,4]
[1,]    0    3   12   27
[2,]    3    0    3   12
[3,]   12    3    0    3
[4,]   27   12    3    0
like image 143
Prasad Chalasani Avatar answered Nov 01 '22 07:11

Prasad Chalasani


outer() works fine if you are willing to do self-compare - such as 1-1 and 2-2 etc... (the diagonal values in the matrix). Also outer() performs both 1-2 and 2-1 comparisions.

Most of the times pair-wise comparisions only require triangular comparisions, without the self-comparision and mirror comparisions. To achieve triangular comparisions, use combn() method.

Here is a sample output to show the difference between outer() and combn()

> v <- c(1,2,3,4)
> outer(v, v, function(x, y) print(paste(x, "-", y)))
 [1] "1 - 1" "2 - 1" "3 - 1" "4 - 1" "1 - 2" "2 - 2" "3 - 2" "4 - 2" "1 - 3" "2 - 3" "3 - 3" "4 - 3" "1 - 4" "2 - 4" "3 - 4" "4 - 4"

Note the "1-1" self-comparisions above. And the "1-2" and "2-1" mirror comparisions. Contrast it with the below:

> v <- c(1,2,3,4)
> allPairs <- combn(length(v), 2) # choose a pair from 1:length(v)
> a_ply(combn(length(v), 2), 2, function(x) print(paste(x[1],"--",x[2]))) # iterate over all pairs
[1] "1 -- 2"
[1] "1 -- 3"
[1] "1 -- 4"
[1] "2 -- 3"
[1] "2 -- 4"
[1] "3 -- 4" 

You can see the "upper triangular" part of the matrix in the above.

Outer() is more apt when you have two different vectors to do pair-wise operation. For performing pair-wise operations within a single vector, more often than not you can get away with combn.

For example, if you are doing outer(x,x,...) then you are perhaps doing it wrong - you should consider combn(length(x),2))

like image 23
Gopalakrishna Palem Avatar answered Nov 01 '22 07:11

Gopalakrishna Palem