Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Product between all combinations of a vector's elements

Suppose I have a vector c(1, 2, 3, 4) with no duplicated values. I need a vector c(1 * 2, 1 * 3, 1 * 4, 2 * 3, 2 * 4, 3 * 4), so the multiplication is done in all possible combinations of this vector's values. Is there a way of doing that? Thanks in advance!

like image 983
Ben Z Avatar asked Jan 02 '23 22:01

Ben Z


1 Answers

This is fun enough. I thought combn(1:4, 2, "*") would be the simplest solution but it actually does not work. We have to use combn(1:4, 2, prod) as Onyambu commented. The issue is: in an "N choose K" setting, FUN must be able to take a vector of length-K as input. "*" is not the right one.

## K = 2 case
"*"(c(1, 2))  ## this is different from: "*"(1, 2)
#Error in *c(1, 2) : invalid unary operator

prod(c(1, 2))
#[1] 2

We are going too far, but we will meet this sooner or later

Thanks Maurits Evers for elaboration on outer / lower.tri / upper.tri. Here is an adapted way to avoid generating those temporary matrices from outer and *****.tri:

tri_ind <- function (n, lower= TRUE, diag = FALSE) {
  if (diag) {
    tmp <- n:1
    j <- rep.int(1:n, tmp)
    i <- sequence(tmp) - 1L + j
    } else {
    tmp <- (n-1):1
    j <- rep.int(1:(n-1), tmp)
    i <- sequence(tmp) + j
    }
  if (lower) list(i = i, j = j)
  else list(i = j, j = i)
  }

vec <- 1:4
ind <- tri_ind(length(vec), FALSE, FALSE)
#$i
#[1] 1 1 1 2 2 3
#
#$j
#[1] 2 3 4 3 4 4

vec[ind[[1]]] * vec[ind[[2]]]
#[1]  2  3  4  6  8 12

The tri_ind function is wrapper of my this answer. It could be used as a fast and memory-efficient alternative for combn(length(vec), 2) or its outer-equivalence.

Originally I linked an finv function but it is not good for benchmarking, as it is designed for extracting a few elements from a "dist" object (a collapsed lower triangular matrix). If all elements of the triangular matrix are referenced, its index computation actually imposes unnecessary overhead. tri_ind is a better option.

library(bench)

benchmark index generation

bench1 <- function (n) {
  bench::mark("combn" = combn(n, 2),
              "tri_ind" = tri_ind(n, TRUE, FALSE),
              "upper.tri" = which(upper.tri(matrix(0, n, n)), arr.ind = TRUE),
              check = FALSE)
  }

## for small problem, `tri_ind` is already the fastest
bench1(100)
#  expression      min     mean  median      max `itr/sec` mem_alloc  n_gc n_itr
#  <chr>      <bch:tm> <bch:tm> <bch:t> <bch:tm>     <dbl> <bch:byt> <dbl> <int>
#1 combn        11.6ms   11.9ms  11.9ms  12.59ms      83.7    39.1KB     9    32
#2 tri_ind     189.3µs  205.9µs 194.6µs   4.82ms    4856.     60.4KB    21  1888
#3 upper.tri   618.4µs  635.8µs 624.1µs 968.36µs    1573.    411.7KB    57   584

## `tri_ind` is 10x faster than `upper.tri`, and 100x faster than `combn`
bench1(5000)
#  expression      min     mean   median      max `itr/sec` mem_alloc  n_gc
#  <chr>      <bch:tm> <bch:tm> <bch:tm> <bch:tm>     <dbl> <bch:byt> <dbl>
#1 combn         30.6s    30.6s    30.6s    30.6s    0.0327    95.4MB   242
#2 tri_ind    231.98ms 259.31ms 259.31ms 286.63ms    3.86     143.3MB     0
#3 upper.tri     3.02s    3.02s    3.02s    3.02s    0.332    953.6MB     4

benchmark on OP's problem

bench2 <- function (n) {
  vec <- numeric(n)
  bench::mark("combn" = combn(vec, 2, prod),
              "tri_ind" = {ind <- tri_ind(n, FALSE, FALSE);
                           vec[ind[[1]]] * vec[ind[[2]]]},
              "upper.tri" = {m <- outer(vec, vec);                                
                             c(m[upper.tri(m)])},
              check = FALSE)
  }

bench2(100)
#  expression      min     mean  median      max `itr/sec` mem_alloc  n_gc n_itr
#  <chr>      <bch:tm> <bch:tm> <bch:t> <bch:tm>     <dbl> <bch:byt> <dbl> <int>
#1 combn        18.6ms   19.2ms  19.1ms  20.55ms      52.2    38.7KB     4    22
#2 tri_ind     386.9µs  432.3µs 395.6µs   7.58ms    2313.    176.6KB     1  1135
#3 upper.tri   326.9µs  488.5µs 517.6µs 699.07µs    2047.      336KB     0  1024

bench2(5000)
#  expression      min     mean   median     max `itr/sec` mem_alloc  n_gc n_itr
#  <chr>      <bch:tm> <bch:tm> <bch:tm> <bch:t>     <dbl> <bch:byt> <dbl> <int>
#1 combn        48.13s   48.13s   48.13s  48.13s    0.0208    95.3MB   204     1
#2 tri_ind     861.7ms  861.7ms  861.7ms 861.7ms    1.16     429.3MB     0     1
#3 upper.tri     1.95s    1.95s    1.95s   1.95s    0.514    810.6MB     3     1

For me, it is interesting to know that combn is not written in compiled code. It actually has an R-level for loop inside. The various alternatives is just trying to speed it up in "N choose 2" case without writing compiled code.

Better choice??

Function combinations from gtools package uses recursive algorithm, which is problematic for big problem size. Function combn from combinat package does not use compiled code so it is no better than combn from R core. The RcppAlgos package by Joseph Wood has a comboGenearl function which is the fastest one I see so far.

library(RcppAlgos)

## index generation
bench3 <- function (n) {
  bench::mark("tri_ind" = tri_ind(n, FALSE, FALSE),
              "Joseph" = comboGeneral(n, 2), check = FALSE)
  }

bench3(5000)
#  expression      min     mean   median     max `itr/sec` mem_alloc  n_gc n_itr
#  <chr>      <bch:tm> <bch:tm> <bch:tm> <bch:t>     <dbl> <bch:byt> <dbl> <int>
#1 tri_ind       290ms    297ms    297ms   303ms      3.37   143.4MB     4     2
#2 Joseph        134ms    155ms    136ms   212ms      6.46    95.4MB     2     4

## on OP's problem
bench4 <- function (n) {
  vec <- numeric(n)
  bench::mark("tri_ind" = {ind <- tri_ind(n, FALSE, FALSE);
                           vec[ind[[1]]] * vec[ind[[2]]]},
              "Joseph" = comboGeneral(vec, 2, constraintFun = "prod", keepResults = TRUE),
              check = FALSE)
  }

bench4(5000)
#  expression      min     mean   median     max `itr/sec` mem_alloc  n_gc n_itr
#  <chr>      <bch:tm> <bch:tm> <bch:tm> <bch:t>     <dbl> <bch:byt> <dbl> <int>
#1 tri_ind       956ms    956ms    956ms   956ms      1.05     429MB     3     1
#2 Joseph        361ms    362ms    362ms   363ms      2.76     286MB     1     2

Joseph Wood has various answers on combinations / permutations. For example: Faster version of combn.

like image 159
Zheyuan Li Avatar answered Jan 05 '23 05:01

Zheyuan Li