Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply a function to all pairwise combinations of list elements in R

I want to apply a function to all pairwise combinations of list elements. Each element is a vector with the same length. I would like the output in a n x n matrix format, n being the number of elements in my list.

Consider the following example:

# Generating data
l <- list()
for(i in 1:5) l[[i]] <- sample(0:9, 5, T)

# Function to apply
foo <- function(x, y) 1 - sum(x * y) / sqrt(sum(x ^ 2) * sum(y ^ 2))

# Generating combinations
comb <- expand.grid(x = 1:5, y = 1:5)

This loop works but it is slow and the output is not formatted as a matrix

# Applying function
out <- list()
for(i in 1:nrow(comb)) {
  out[[i]] <- foo(l[[comb[i, 'x']]], l[[comb[i, 'y']]])
}

Any idea?

like image 987
goclem Avatar asked Jan 25 '16 17:01

goclem


1 Answers

A nested sapply would do the trick:

sapply(l, function(x) sapply(l, function(y) foo(x,y)))

I was interested in @A. Webb's solution. Here is some benchmarking:

R> for(i in 1:50) l[[i]] <- sample(0:9, 5, T)
R> microbenchmark(sapply(l, function(x) sapply(l, function(y) foo(x,y))), outer(l,l,Vectorize(foo)), time=1000)
Unit: nanoseconds
                                                    expr     min        lq
 sapply(l, function(x) sapply(l, function(y) foo(x, y))) 7493739 8479127.0
                             outer(l, l, Vectorize(foo)) 6778098 8316362.5
                                                    time       5      48.5
      mean    median        uq      max neval
 1.042e+07 1.027e+07 1.155e+07 17982289   100
 1.030e+07 1.002e+07 1.187e+07 16076063   100
 1.672e+02 1.385e+02 1.875e+02      914   100

R> for(i in 1:500) l[[i]] <- sample(0:9, 5, T)
R> microbenchmark(sapply(l, function(x) sapply(l, function(y) foo(x,y))), outer(l,l,Vectorize(foo)), times=100)
Unit: milliseconds
                                                    expr   min    lq  mean
 sapply(l, function(x) sapply(l, function(y) foo(x, y))) 677.3 768.5 820.4
                             outer(l, l, Vectorize(foo)) 828.6 903.0 958.3
 median    uq  max neval
  815.9 842.7 1278   100
  930.7 960.5 1819   100

So for smaller lists the outer solution is a little faster, but for larger lists it appears that the nested sapply solution may be a bit faster.

like image 138
cr1msonB1ade Avatar answered Oct 19 '22 15:10

cr1msonB1ade