Optimise which.max along multiple dimensions of an array

Tags:

multidimensional-array

r

I have some code with a 4 dimensional array, and I need to apply which.max across multiple dimensions. Its quite slow, I'd like to find ways to speed it up.

Example:

library(microbenchmark)

array4d <- array( runif(5*500*50*5 ,-1,0),
                  dim = c(5, 500, 50, 5) )

microbenchmark(
    max_idx <- apply(array4d, c(1,2,3), which.max )
    )

Any tips appreciated, thanks!

Edit: I've managed to make it marginally faster (though ugly) by directly coding it out in for loops - but I'm hoping someone out there has better ideas!

method1 <- function(z) {
    apply(z, c(1,2,3), which.max)
}
method2 <- function(z){
    result <- array( , dim = dim(z)[1:3] )
    for(i in 1:dim(z)[1]){
        for(j in 1:dim(z)[2]){
            for(k in 1:dim(z)[3]){
                result[i, j, k] <- which.max(z[i,j,k,])
            }
        }
    }
    return(result)
}

microbenchmark(
result1 <- method1(array4d),
result2 <- method2(array4d))

> microbenchmark(
+     result1 <- method1(array4d),
+     result2 <- method2(array4d)
+ )
Unit: milliseconds
                        expr      min       lq     mean   median       uq      max neval cld
 result1 <- method1(array4d) 111.9061 140.1400 165.2441 155.6773 170.3967 384.6425   100   b
 result2 <- method2(array4d) 113.4572 123.2429 136.8583 130.8505 141.9620 215.0968   100  a

646

asked Jul 14 '20 12:07

user2498193

1 Answers

Adding to more methods. One using R the other tuning code from @Allan-Cameron:

method4 <- function(z){
  result <- array(integer(1) , dim = head(dim(z), -1))
  n <- prod(head(dim(z), -1))
  j <- seq_len(tail(dim(z),1)) * n - n
  for(i in seq_len(n)) result[i] <- which.max(z[i+j])
  result}

Rcpp::cppFunction("
NumericVector method5(const NumericVector &input){
  std::vector<int> dims = input.attr(\"dim\");
  int last_dim = dims.back();
  int diff = input.size()/last_dim;
  std::vector<int> result(diff);
  dims.pop_back();
  
  for(int i = 0; i < diff; ++i)
  {
    double max_val = input[i];
    int max_ind = 0;
    for(int j = 0; j < last_dim; ++j)
    {
      if(input[i+j*diff] > max_val) {
        max_val = input[i+j*diff];
        max_ind = j;
      }
    }
    result[i] = max_ind + 1;
  }
  
  NumericVector arr = wrap(result);
  arr.attr(\"dim\") = dims;
  return arr ;
}"
)

Timings:

set.seed(42)
array4d <- array(runif(5*500*50*5, -1, 0), dim = c(5, 500, 50, 5))

library(microbenchmark)
microbenchmark(
  check = "equal", control=list(order="block")
, method1(array4d)         #Using code from Question
, method2(array4d)         #Using code from Question
, apply_which_max(array4d) #Using code from Allan Cameron
, method4(array4d)
, method5(array4d)
)
#Unit: microseconds
#                     expr        min         lq        mean      median          uq        max neval  cld
#         method1(array4d) 200857.804 228567.850 266815.6275 254530.3050 294578.1125 423838.879   100    d
#         method2(array4d) 144767.680 149616.981 162367.6556 150688.1860 182290.4980 315650.052   100   c 
# apply_which_max(array4d)   3131.482   3153.712   3346.1025   3175.9445   3206.2220   5922.866   100 a   
#         method4(array4d)  58618.275  60777.584  62334.8258  61198.1815  61702.2170 165254.042   100  b  
#         method5(array4d)    894.823    902.862    972.2953    911.9845    927.0885   2643.957   100 a

For random select, instead of first match:

Rcpp::cppFunction("
NumericVector method6(const NumericVector &input){
  std::srand(std::time(nullptr));
  std::vector<int> dims = input.attr(\"dim\");
  int last_dim = dims.back();
  int diff = input.size()/last_dim;
  std::vector<int> result(diff);
  dims.pop_back();
  
  for(int i = 0; i < diff; ++i)
  {
    double max_val = input[i];
    std::vector<int> max_ind = {0};
    for(int j = 1; j < last_dim; ++j)
    {
      if(input[i+j*diff] > max_val) {
        max_val = input[i+j*diff];
        max_ind.clear();
        max_ind.push_back(j);
      } else if(input[i+j*diff] == max_val) max_ind.push_back(j);
    }
    result[i] = max_ind[std::rand() % max_ind.size()] + 1;
  }
  
  NumericVector arr = wrap(result);
  arr.attr(\"dim\") = dims;
  return arr ;
}"
)

102

answered Oct 20 '22 18:10

GKi

Related questions
                            
                                How to flatten non atomic function results so that can be assigned as part of a dplyr mutate step?
                            
                                R max function returns pseudo values when used within 'dplyr'
                            
                                How to rbind() / dplyr::bind_rows() / data.table::rbindlist() data frames which contain data frame columns?
                            
                                RStudio README.Rmd and README.md should be both staged use 'git commit --no-verify' to override this check
                            
                                R Draws Plots with Rectangles Instead of Text
                            
                                How to kill own Oracle SQL sessions without DBA privileges?
                            
                                Is there an expand.grid like function with matrix output
                            
                                How do you position the title and legend in tmap?
                            
                                ggiraph plot not appearing in shiny app, but works in RStudio
                            
                                Performing operations with lag on a dataframe to calculate a new value in R [duplicate]
                            
                                Call to weight in lm() within function doesn't evaluate properly
                            
                                Accessing a table in a stack overflow (SO) question to use as a dataframe for an answer [duplicate]
                            
                                How to Retain the values in the Check Box in R Shiny App?
                            
                                Rcpp proxy model and R memory allocation
                            
                                ggplot2: how to assign value of variable to ggplot title
                            
                                Display layers at certain zoom levels in R Leaflet
                            
                                R install.packages returns 'ERROR: failed to lock directory'
                            
                                Pandas equivalent of dplyr everything()
                            
                                Timestamp R Sequence Milliseconds
                            
                                Tidy Eval: Using {{var}} inside a nesting function not possible?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With