Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiplication of matrices along rows to get an array

I have a 2x4 matrix A and a 3x4 matrix B. I would like to get an array C of dimensions (2, 3, 4) where the ijk entry of C is the product of the ik entry of A and the jk entry of B.

Is there a fast way to do this in R by avoiding a loop? Example below with two ways to calculate what I am looking for -- both of which involve for loops

A <- matrix(1:8, 2, 4)
B <- matrix(11:22, 3, 4)

C1 <- array(NA, dim=c(2, 3, 4))
for (ii in 1:2) {
  for (jj in 1:3) {
    C1[ii, jj, ] <- A[ii, ] * B[jj, ]
  }
}

C2 <- array(NA, dim=c(2, 3, 4))
for (ss in 1:4) {
  C2[, , ss] <- outer(A[, ss], B[, ss])
}
like image 881
GPa Avatar asked Oct 14 '25 04:10

GPa


2 Answers

Not much of an improvement, but using abind:

C3 <- do.call(abind::abind, c(lapply(seq(ncol(A)), function(ss) outer(A[,ss], B[,ss])), along=3))

Benchmark

NOTE: the previous time I ran this benchmark was on a different laptop running R-4.0.5, and in that case, running the benchmark several times, C1's performance as on par with C2's. I switched laptops (for other reasons), saw @jay.sf's answer and wanted to add it to the fray, and now C1 is significantly faster. I cannot explain the different, but the relevant specs:

  • previous benchmark: windows 10, R-4.0.5
  • this benchmark: windows 11, R-4.1.2

It seems to me that the "turbulence" in the benchmarks is due to the small data size, influenced heavily by administrative overhead. If the matrices are much larger, I would expect better breakout (without verification).

bench::mark(
  C1 = {
  for (ii in 1 : 2) {
    for (jj in 1 : 3) {
      C1[ii, jj, ] <- A[ii, ] * B[jj, ]
    }
  }
  as.numeric(C1)
},
  C2 = {
  for (ss in 1 : 4) {
    C2[, , ss] <- outer(A[, ss], B[, ss])
  }
  as.numeric(C2)
},
  C3 = as.numeric(do.call(abind::abind, c(lapply(seq(ncol(A)), function(ss) outer(A[,ss], B[,ss])), along=3))),
  C4 = as.numeric(vapply(1:4, \(ss) outer(A[, ss], B[, ss]), matrix(0, 2, 3)))
)
# # A tibble: 4 x 13
#   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result     memory             time                gc                   
#   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>     <list>             <list>              <list>               
# 1 C1           16.6us   52.3us    18669.      240B     0     9317     0      499ms <dbl [24]> <Rprofmem [1 x 3]> <bench_tm [9,317]>  <tibble [9,317 x 3]> 
# 2 C2           25.8us   85.6us    11225.      240B     2.14  5235     1      466ms <dbl [24]> <Rprofmem [1 x 3]> <bench_tm [5,236]>  <tibble [5,236 x 3]> 
# 3 C3            763us  797.4us     1144.      720B     0      572     0      500ms <dbl [24]> <Rprofmem [3 x 3]> <bench_tm [572]>    <tibble [572 x 3]>   
# 4 C4           24.8us   36.7us    21752.      240B     2.18  9999     1      460ms <dbl [24]> <Rprofmem [1 x 3]> <bench_tm [10,000]> <tibble [10,000 x 3]>

(I added as.numeric to each of them because some return ints, some doubles, and I didn't want to as.numeric one and bias the benchmark. It's not strictly required for any of them, but now we can be assured that they are all equal, since otherwise bench::mark would fail, complaining that the output differs.)

I like @jay.sf's answer because it is both fast and does not require additional packages (abind is non-standard but handy to have nonetheless).


Benchmark, take 2

Let's grow the data a little.

A2 <- do.call(cbind, replicate(50, do.call(rbind, replicate(50, A, simplify = FALSE)), simplify = FALSE))
Abig <- do.call(cbind, replicate(50, do.call(rbind, replicate(50, A, simplify = FALSE)), simplify = FALSE))
Bbig <- do.call(cbind, replicate(50, do.call(rbind, replicate(50, B, simplify = FALSE)), simplify = FALSE))
C1big <- C2big <- array(dim=c(dim(Abig)[1], dim(Bbig)))
dim(Abig)
# [1] 100 200

Now a different benchmark, now on windows 11, R-4.1.2:

bench::mark(
  C1big = {
  for (ii in seq(nrow(Abig))) {
    for (jj in seq(nrow(Bbig))) {
      C1big[ii, jj, ] <- Abig[ii, ] * Bbig[jj, ]
    }
  }
  as.numeric(C1big)
},
  C2big = {
  for (ss in seq(ncol(Abig))) {
    C2big[, , ss] <- outer(Abig[, ss], Bbig[, ss])
  }
  as.numeric(C2big)
},
  C3big = as.numeric(do.call(abind::abind, c(lapply(seq(ncol(Abig)), function(ss) outer(Abig[,ss], Bbig[,ss])), along=3))),
  C4big = as.numeric(vapply(seq(ncol(Abig)), function(ss) outer(Abig[, ss], Bbig[, ss]), matrix(0, nrow(Abig), nrow(Bbig)))),
  iterations = 30
)
# # A tibble: 4 x 13
#   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result            memory                  time            gc               
#   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>            <list>                  <list>          <list>           
# 1 C1big       100.7ms    135ms      7.38    83.5MB     2.68    22     8      2.98s <dbl [3,000,000]> <Rprofmem [75,001 x 3]> <bench_tm [30]> <tibble [30 x 3]>
# 2 C2big          22ms   24.4ms     36.0     46.8MB     7.20    25     5    694.7ms <dbl [3,000,000]> <Rprofmem [1,801 x 3]>  <bench_tm [30]> <tibble [30 x 3]>
# 3 C3big        26.5ms   28.4ms     32.8     92.6MB    28.7     16    14   488.54ms <dbl [3,000,000]> <Rprofmem [1,632 x 3]>  <bench_tm [30]> <tibble [30 x 3]>
# 4 C4big          10ms   17.4ms     61.2     46.7MB    18.6     23     7    375.9ms <dbl [3,000,000]> <Rprofmem [1,402 x 3]>  <bench_tm [30]> <tibble [30 x 3]>

It seems like we're getting to where I would expect: jay.sf's vapply implementation should do very well, and it appears to be breaking out with `itr/sec` and mem_alloc being good indicators.

enter image description here

like image 154
r2evans Avatar answered Oct 16 '25 18:10

r2evans


vapply() should be fast.

vapply(1:4, \(ss) outer(A[, ss], B[, ss]), matrix(0, 2, 3))
# , , 1
# 
# [,1] [,2] [,3]
# [1,]   11   12   13
# [2,]   22   24   26
# 
# , , 2
# 
# [,1] [,2] [,3]
# [1,]   42   45   48
# [2,]   56   60   64
# 
# , , 3
# 
# [,1] [,2] [,3]
# [1,]   85   90   95
# [2,]  102  108  114
# 
# , , 4
# 
# [,1] [,2] [,3]
# [1,]  140  147  154
# [2,]  160  168  176

sapply is also possible, but slower (same output):

sapply(1:4, \(ss) outer(A[, ss], B[, ss]), simplify='array')

Benchmark:

AA <- matrix(1:8, 1e3, 4e2)
BB <- matrix(11:22, 2e3, 4e2)
microbenchmark::microbenchmark(
  `for1`={C1 <- array(NA, dim=c(1e3, 2e3, 4e2))
  for (ii in 1:2) {
    for (jj in 1:3) {
      C1[ii, jj, ] <- AA[ii, ] * BB[jj, ]
    }
  }},
  vapply=vapply(1:4, \(ss) outer(AA[, ss], BB[, ss]), matrix(0, 1e3, 2e3)),
  abind=do.call(abind::abind, c(lapply(seq(ncol(AA)), function(ss) outer(AA[,ss], BB[,ss])), along=3)),
  sapply=sapply(1:4, \(ss) outer(AA[, ss], BB[, ss]), simplify='array'),
  times=3L, control=list(warmup=1e2L))
# Unit: milliseconds
#   expr         min          lq        mean     median         uq        max neval cld
#   for1  3766.87037  3792.71469  3837.77384  3818.5590  3873.2256  3927.8921     3  b 
# vapply    34.04493    68.17596    98.97098   102.3070   131.4340   160.5610     3 a  
#  abind 11736.37882 12063.20849 12320.85601 12390.0382 12613.0946 12836.1511     3   c
# sapply    58.41669    81.65372   139.44338   104.8907   179.9567   255.0227     3 a  
like image 37
jay.sf Avatar answered Oct 16 '25 18:10

jay.sf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!