Take this example array:
set.seed(1)
rows <- 5
cols <- 4
dept <- 3
a <- array(sample(1:100, rows*cols*dept), dim = c(rows, cols, dept))
returning
> a
, , 1
[,1] [,2] [,3] [,4]
[1,] 68 43 85 73
[2,] 39 14 21 79
[3,] 1 82 54 37
[4,] 34 59 74 83
[5,] 87 51 7 97
, , 2
[,1] [,2] [,3] [,4]
[1,] 44 96 72 99
[2,] 84 42 80 91
[3,] 33 38 40 75
[4,] 35 20 69 6
[5,] 70 28 25 24
, , 3
[,1] [,2] [,3] [,4]
[1,] 32 22 100 50
[2,] 94 92 62 65
[3,] 2 90 23 11
[4,] 45 98 67 17
[5,] 18 64 49 36
For each "dept" dimension, I want to get the sum over the rows, while keeping the original three dimensions of the array. I tried
b <- apply(a, c(2,3), sum)
> b
[,1] [,2] [,3]
[1,] 229 266 191
[2,] 249 224 366
[3,] 241 286 301
[4,] 369 295 179
which gives the correct result but reduces it to a 4 by 3 matrix since the row dimension is collapsed to 1 and is no longer strictly needed. However, for my calculations it is inconvenient when dimension interpretations changes every time I perform an operation so I want to obtain a 1x4x3 array instead:
c <- array(b, dim = c(1, 4, 3))
> c
, , 1
[,1] [,2] [,3] [,4]
[1,] 229 249 241 369
, , 2
[,1] [,2] [,3] [,4]
[1,] 266 224 286 295
, , 3
[,1] [,2] [,3] [,4]
[1,] 191 366 301 179
This accomplishes what I want but I think it is a bit cumbersome and I am not sure how to generalize it to different operations on any number of dimensions. There has to be a more compact way of doing these operations. I found the ``rray` package but it is not compatible with R 4.0.2. Note that my actual arrays are much larger than this example and I will have to apply these types of operations many times in a numerical optimization problem, so computing efficiency is important.
To generalize and keep calculations in one line you could do:
array(apply(a, 2:3, sum), c(1, dim(a)[-1]))
# , , 1
#
# [,1] [,2] [,3] [,4]
# [1,] 229 249 241 369
#
# , , 2
#
# [,1] [,2] [,3] [,4]
# [1,] 266 224 286 295
#
# , , 3
#
# [,1] [,2] [,3] [,4]
# [1,] 191 366 301 179
Or, since it's vectorized and thus much faster, using colSums
array(colSums(a, dims=1), c(1, dim(a)[-1]))
# , , 1
#
# [,1] [,2] [,3] [,4]
# [1,] 229 249 241 369
#
# , , 2
#
# [,1] [,2] [,3] [,4]
# [1,] 266 224 286 295
#
# , , 3
#
# [,1] [,2] [,3] [,4]
# [1,] 191 366 301 179
Benchmark:
set.seed(42)
A <- array(rnorm(5e4*100*10), dim=c(5e4, 100, 10))
library(rray)
microbenchmark::microbenchmark(apply=array(apply(A, 2:3, sum), c(1, dim(A)[-1])),
colSums=array(colSums(A, dims=1), c(1, dim(A)[-1])),
rray_sum=rray_sum(A, 1)) ## rray: see other answer
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# apply 1273.51152 1381.72037 1416.33429 1395.84693 1433.72407 1848.88436 100 b
# colSums 72.07086 73.02890 73.85052 73.63013 74.38916 79.70227 100 a
# rray_sum 71.46261 72.50294 73.27564 73.00747 73.70348 80.36409 100 a
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With