Fastest way to apply function to all pairwise combinations of columns

Question

Given a data frame or matrix with arbitrary number of rows and columns, what is the fastest way to apply a function to all pairwise combinations of columns?

For example, if I have a data table:

N <- 3
K <- 3
data <- data.table(id=seq(N))
for(k in seq(K)) {
    data[[k]] <- runif(N)
}

And I want to compute the simple difference between all pairs of columns, I could loop (or lapply) over columns:

differences = data.table(foo=seq(N))
for(var1 in names(data)) {
    for(var2 in names(data)) {
        if (var1==var2) next
        if (which(names(data)==var1)>which(names(data)==var2)) next
        combo <- paste0(var1, var2)
        differences[[combo]] <- data[[var1]]-data[[var2]]
    }
}

But as K gets larger, this becomes absurdly slow.

One solution I've considered is to make two new data tables using combn and subtract them:

a <- data[,combn(colnames(data),2)[1,],with=F]
b <- data[,combn(colnames(data),2)[2,],with=F]
differences <- a-b

But as N and K get larger, this becomes very memory intensive (though faster than looping).

It seems to me that the outer product of the matrix with itself is probably the best way to go, but I can't piece it together. This is especially hard if I want to apply an arbitrary function (RMSE for example), instead of just the difference.

What's the fastest way?

Alexander Radev · Accepted Answer

If it is necessary to have the data in a matrix first, you can do the following:

library(data.table)

data <- matrix(runif(300*500), nrow = 300, ncol = 500)

data.DT <- setkey(data.table(c(data), colId = rep(1:500, each = 300), rowId = rep(1:300, times = 500)), colId)

diff.DT <- data.DT[
  , {
    ccl <- unique(colId)
    vv <- V1
    data.DT[colId > ccl, .(col2 = colId, V1 - vv)]
  }
  , keyby = .(col1 = colId)
]

Fastest way to apply function to all pairwise combinations of columns

Tags:

r

data.table

dmp

1 Answers

Alexander Radev

Recent Activity

Donate For Us

Fastest way to apply function to all pairwise combinations of columns

Tags:

r

data.table

dmp

1 Answers

Alexander Radev

Related questions

Recent Activity

Donate For Us