Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimise which.max along multiple dimensions of an array

I have some code with a 4 dimensional array, and I need to apply which.max across multiple dimensions. Its quite slow, I'd like to find ways to speed it up.

Example:

library(microbenchmark)

array4d <- array( runif(5*500*50*5 ,-1,0),
                  dim = c(5, 500, 50, 5) )

microbenchmark(
    max_idx <- apply(array4d, c(1,2,3), which.max )
    )

Any tips appreciated, thanks!

Edit: I've managed to make it marginally faster (though ugly) by directly coding it out in for loops - but I'm hoping someone out there has better ideas!

method1 <- function(z) {
    apply(z, c(1,2,3), which.max)
}
method2 <- function(z){
    result <- array( , dim = dim(z)[1:3] )
    for(i in 1:dim(z)[1]){
        for(j in 1:dim(z)[2]){
            for(k in 1:dim(z)[3]){
                result[i, j, k] <- which.max(z[i,j,k,])
            }
        }
    }
    return(result)
}

microbenchmark(
result1 <- method1(array4d),
result2 <- method2(array4d))

> microbenchmark(
+     result1 <- method1(array4d),
+     result2 <- method2(array4d)
+ )
Unit: milliseconds
                        expr      min       lq     mean   median       uq      max neval cld
 result1 <- method1(array4d) 111.9061 140.1400 165.2441 155.6773 170.3967 384.6425   100   b
 result2 <- method2(array4d) 113.4572 123.2429 136.8583 130.8505 141.9620 215.0968   100  a 
like image 646
user2498193 Avatar asked Jul 14 '20 12:07

user2498193


People also ask

How to compute the maximum over all dimensions of an array?

Create a 3-D array and compute the maximum over each page of data (rows and columns). Starting in R2018b, to compute the maximum over all dimensions of an array, you can either specify each dimension in the vector dimension argument, or use the 'all' option.

What is the maximum size of an array in Python?

Maximum values, returned as a scalar, vector, matrix, or multidimensional array. size(M,dim) is 1, while the sizes of all other dimensions match the size of the corresponding dimension in A, unless size(A,dim) is 0.

How do I aggregate an array with more than one dimension?

If we aggregate a one-dimensional array, we get a single point. If we aggregate a two-dimensional array, we get a one-dimensional array and so on. If we have more than one dimensions, we must decide which dimension (or axis) we want to eliminate. In NumPy this can be done by specifying the axis parameter in the function call.

How to find the maximum of J-I in an array?

A common technique with array problems is to have two pointers. As we need to find the maximum of j - i, let’s try to initialize the pointers at the start and the end of the array. If we already get A [i] <= A [j], we don’t even have to iterate and we can return j - i which is equals to size (A) - 1.


1 Answers

Adding to more methods. One using R the other tuning code from @Allan-Cameron:

method4 <- function(z){
  result <- array(integer(1) , dim = head(dim(z), -1))
  n <- prod(head(dim(z), -1))
  j <- seq_len(tail(dim(z),1)) * n - n
  for(i in seq_len(n)) result[i] <- which.max(z[i+j])
  result}
Rcpp::cppFunction("
NumericVector method5(const NumericVector &input){
  std::vector<int> dims = input.attr(\"dim\");
  int last_dim = dims.back();
  int diff = input.size()/last_dim;
  std::vector<int> result(diff);
  dims.pop_back();
  
  for(int i = 0; i < diff; ++i)
  {
    double max_val = input[i];
    int max_ind = 0;
    for(int j = 0; j < last_dim; ++j)
    {
      if(input[i+j*diff] > max_val) {
        max_val = input[i+j*diff];
        max_ind = j;
      }
    }
    result[i] = max_ind + 1;
  }
  
  NumericVector arr = wrap(result);
  arr.attr(\"dim\") = dims;
  return arr ;
}"
)

Timings:

set.seed(42)
array4d <- array(runif(5*500*50*5, -1, 0), dim = c(5, 500, 50, 5))

library(microbenchmark)
microbenchmark(
  check = "equal", control=list(order="block")
, method1(array4d)         #Using code from Question
, method2(array4d)         #Using code from Question
, apply_which_max(array4d) #Using code from Allan Cameron
, method4(array4d)
, method5(array4d)
)
#Unit: microseconds
#                     expr        min         lq        mean      median          uq        max neval  cld
#         method1(array4d) 200857.804 228567.850 266815.6275 254530.3050 294578.1125 423838.879   100    d
#         method2(array4d) 144767.680 149616.981 162367.6556 150688.1860 182290.4980 315650.052   100   c 
# apply_which_max(array4d)   3131.482   3153.712   3346.1025   3175.9445   3206.2220   5922.866   100 a   
#         method4(array4d)  58618.275  60777.584  62334.8258  61198.1815  61702.2170 165254.042   100  b  
#         method5(array4d)    894.823    902.862    972.2953    911.9845    927.0885   2643.957   100 a   

For random select, instead of first match:

Rcpp::cppFunction("
NumericVector method6(const NumericVector &input){
  std::srand(std::time(nullptr));
  std::vector<int> dims = input.attr(\"dim\");
  int last_dim = dims.back();
  int diff = input.size()/last_dim;
  std::vector<int> result(diff);
  dims.pop_back();
  
  for(int i = 0; i < diff; ++i)
  {
    double max_val = input[i];
    std::vector<int> max_ind = {0};
    for(int j = 1; j < last_dim; ++j)
    {
      if(input[i+j*diff] > max_val) {
        max_val = input[i+j*diff];
        max_ind.clear();
        max_ind.push_back(j);
      } else if(input[i+j*diff] == max_val) max_ind.push_back(j);
    }
    result[i] = max_ind[std::rand() % max_ind.size()] + 1;
  }
  
  NumericVector arr = wrap(result);
  arr.attr(\"dim\") = dims;
  return arr ;
}"
)
like image 102
GKi Avatar answered Oct 20 '22 18:10

GKi