Suppose I have a vector c(1, 2, 3, 4)
with no duplicated values. I need a vector c(1 * 2, 1 * 3, 1 * 4, 2 * 3, 2 * 4, 3 * 4)
, so the multiplication is done in all possible combinations of this vector's values. Is there a way of doing that? Thanks in advance!
This is fun enough. I thought combn(1:4, 2, "*")
would be the simplest solution but it actually does not work. We have to use combn(1:4, 2, prod)
as Onyambu commented. The issue is: in an "N choose K" setting, FUN
must be able to take a vector of length-K as input. "*"
is not the right one.
## K = 2 case
"*"(c(1, 2)) ## this is different from: "*"(1, 2)
#Error in *c(1, 2) : invalid unary operator
prod(c(1, 2))
#[1] 2
Thanks Maurits Evers for elaboration on outer
/ lower.tri
/ upper.tri
. Here is an adapted way to avoid generating those temporary matrices from outer
and *****.tri
:
tri_ind <- function (n, lower= TRUE, diag = FALSE) {
if (diag) {
tmp <- n:1
j <- rep.int(1:n, tmp)
i <- sequence(tmp) - 1L + j
} else {
tmp <- (n-1):1
j <- rep.int(1:(n-1), tmp)
i <- sequence(tmp) + j
}
if (lower) list(i = i, j = j)
else list(i = j, j = i)
}
vec <- 1:4
ind <- tri_ind(length(vec), FALSE, FALSE)
#$i
#[1] 1 1 1 2 2 3
#
#$j
#[1] 2 3 4 3 4 4
vec[ind[[1]]] * vec[ind[[2]]]
#[1] 2 3 4 6 8 12
The tri_ind
function is wrapper of my this answer. It could be used as a fast and memory-efficient alternative for combn(length(vec), 2)
or its outer
-equivalence.
Originally I linked an finv
function but it is not good for benchmarking, as it is designed for extracting a few elements from a "dist" object (a collapsed lower triangular matrix). If all elements of the triangular matrix are referenced, its index computation actually imposes unnecessary overhead. tri_ind
is a better option.
library(bench)
benchmark index generation
bench1 <- function (n) {
bench::mark("combn" = combn(n, 2),
"tri_ind" = tri_ind(n, TRUE, FALSE),
"upper.tri" = which(upper.tri(matrix(0, n, n)), arr.ind = TRUE),
check = FALSE)
}
## for small problem, `tri_ind` is already the fastest
bench1(100)
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr
# <chr> <bch:tm> <bch:tm> <bch:t> <bch:tm> <dbl> <bch:byt> <dbl> <int>
#1 combn 11.6ms 11.9ms 11.9ms 12.59ms 83.7 39.1KB 9 32
#2 tri_ind 189.3µs 205.9µs 194.6µs 4.82ms 4856. 60.4KB 21 1888
#3 upper.tri 618.4µs 635.8µs 624.1µs 968.36µs 1573. 411.7KB 57 584
## `tri_ind` is 10x faster than `upper.tri`, and 100x faster than `combn`
bench1(5000)
# expression min mean median max `itr/sec` mem_alloc n_gc
# <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#1 combn 30.6s 30.6s 30.6s 30.6s 0.0327 95.4MB 242
#2 tri_ind 231.98ms 259.31ms 259.31ms 286.63ms 3.86 143.3MB 0
#3 upper.tri 3.02s 3.02s 3.02s 3.02s 0.332 953.6MB 4
benchmark on OP's problem
bench2 <- function (n) {
vec <- numeric(n)
bench::mark("combn" = combn(vec, 2, prod),
"tri_ind" = {ind <- tri_ind(n, FALSE, FALSE);
vec[ind[[1]]] * vec[ind[[2]]]},
"upper.tri" = {m <- outer(vec, vec);
c(m[upper.tri(m)])},
check = FALSE)
}
bench2(100)
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr
# <chr> <bch:tm> <bch:tm> <bch:t> <bch:tm> <dbl> <bch:byt> <dbl> <int>
#1 combn 18.6ms 19.2ms 19.1ms 20.55ms 52.2 38.7KB 4 22
#2 tri_ind 386.9µs 432.3µs 395.6µs 7.58ms 2313. 176.6KB 1 1135
#3 upper.tri 326.9µs 488.5µs 517.6µs 699.07µs 2047. 336KB 0 1024
bench2(5000)
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr
# <chr> <bch:tm> <bch:tm> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl> <int>
#1 combn 48.13s 48.13s 48.13s 48.13s 0.0208 95.3MB 204 1
#2 tri_ind 861.7ms 861.7ms 861.7ms 861.7ms 1.16 429.3MB 0 1
#3 upper.tri 1.95s 1.95s 1.95s 1.95s 0.514 810.6MB 3 1
For me, it is interesting to know that combn
is not written in compiled code. It actually has an R-level for loop inside. The various alternatives is just trying to speed it up in "N choose 2" case without writing compiled code.
Better choice??
Function combinations
from gtools
package uses recursive algorithm, which is problematic for big problem size. Function combn
from combinat
package does not use compiled code so it is no better than combn
from R core. The RcppAlgos
package by Joseph Wood has a comboGenearl
function which is the fastest one I see so far.
library(RcppAlgos)
## index generation
bench3 <- function (n) {
bench::mark("tri_ind" = tri_ind(n, FALSE, FALSE),
"Joseph" = comboGeneral(n, 2), check = FALSE)
}
bench3(5000)
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr
# <chr> <bch:tm> <bch:tm> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl> <int>
#1 tri_ind 290ms 297ms 297ms 303ms 3.37 143.4MB 4 2
#2 Joseph 134ms 155ms 136ms 212ms 6.46 95.4MB 2 4
## on OP's problem
bench4 <- function (n) {
vec <- numeric(n)
bench::mark("tri_ind" = {ind <- tri_ind(n, FALSE, FALSE);
vec[ind[[1]]] * vec[ind[[2]]]},
"Joseph" = comboGeneral(vec, 2, constraintFun = "prod", keepResults = TRUE),
check = FALSE)
}
bench4(5000)
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr
# <chr> <bch:tm> <bch:tm> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl> <int>
#1 tri_ind 956ms 956ms 956ms 956ms 1.05 429MB 3 1
#2 Joseph 361ms 362ms 362ms 363ms 2.76 286MB 1 2
Joseph Wood has various answers on combinations / permutations. For example: Faster version of combn.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With