Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply function on array returning original number of dimensions

Tags:

arrays

r

apply

Take this example array:

set.seed(1)
rows <- 5
cols <- 4
dept <- 3
a <- array(sample(1:100, rows*cols*dept), dim = c(rows, cols, dept))

returning

> a
, , 1

     [,1] [,2] [,3] [,4]
[1,]   68   43   85   73
[2,]   39   14   21   79
[3,]    1   82   54   37
[4,]   34   59   74   83
[5,]   87   51    7   97

, , 2

     [,1] [,2] [,3] [,4]
[1,]   44   96   72   99
[2,]   84   42   80   91
[3,]   33   38   40   75
[4,]   35   20   69    6
[5,]   70   28   25   24

, , 3

     [,1] [,2] [,3] [,4]
[1,]   32   22  100   50
[2,]   94   92   62   65
[3,]    2   90   23   11
[4,]   45   98   67   17
[5,]   18   64   49   36

For each "dept" dimension, I want to get the sum over the rows, while keeping the original three dimensions of the array. I tried

b <- apply(a, c(2,3), sum)
> b
     [,1] [,2] [,3]
[1,]  229  266  191
[2,]  249  224  366
[3,]  241  286  301
[4,]  369  295  179

which gives the correct result but reduces it to a 4 by 3 matrix since the row dimension is collapsed to 1 and is no longer strictly needed. However, for my calculations it is inconvenient when dimension interpretations changes every time I perform an operation so I want to obtain a 1x4x3 array instead:

c <- array(b, dim = c(1, 4, 3))
> c
, , 1

     [,1] [,2] [,3] [,4]
[1,]  229  249  241  369

, , 2

     [,1] [,2] [,3] [,4]
[1,]  266  224  286  295

, , 3

     [,1] [,2] [,3] [,4]
[1,]  191  366  301  179

This accomplishes what I want but I think it is a bit cumbersome and I am not sure how to generalize it to different operations on any number of dimensions. There has to be a more compact way of doing these operations. I found the ``rray` package but it is not compatible with R 4.0.2. Note that my actual arrays are much larger than this example and I will have to apply these types of operations many times in a numerical optimization problem, so computing efficiency is important.

like image 723
avs Avatar asked Feb 24 '26 21:02

avs


1 Answers

To generalize and keep calculations in one line you could do:

array(apply(a, 2:3, sum), c(1, dim(a)[-1]))
# , , 1
# 
# [,1] [,2] [,3] [,4]
# [1,]  229  249  241  369
# 
# , , 2
# 
# [,1] [,2] [,3] [,4]
# [1,]  266  224  286  295
# 
# , , 3
# 
# [,1] [,2] [,3] [,4]
# [1,]  191  366  301  179

Or, since it's vectorized and thus much faster, using colSums

array(colSums(a, dims=1), c(1, dim(a)[-1]))
# , , 1
# 
# [,1] [,2] [,3] [,4]
# [1,]  229  249  241  369
# 
# , , 2
# 
# [,1] [,2] [,3] [,4]
# [1,]  266  224  286  295
# 
# , , 3
# 
# [,1] [,2] [,3] [,4]
# [1,]  191  366  301  179

Benchmark:

set.seed(42)
A <- array(rnorm(5e4*100*10), dim=c(5e4, 100, 10))

library(rray)
microbenchmark::microbenchmark(apply=array(apply(A, 2:3, sum), c(1, dim(A)[-1])),
                               colSums=array(colSums(A, dims=1), c(1, dim(A)[-1])),
                               rray_sum=rray_sum(A, 1))  ## rray: see other answer
# Unit: milliseconds
#     expr        min         lq       mean     median         uq        max neval cld
#    apply 1273.51152 1381.72037 1416.33429 1395.84693 1433.72407 1848.88436   100   b
#  colSums   72.07086   73.02890   73.85052   73.63013   74.38916   79.70227   100  a 
# rray_sum   71.46261   72.50294   73.27564   73.00747   73.70348   80.36409   100  a 
like image 97
jay.sf Avatar answered Feb 26 '26 09:02

jay.sf