I have some code with a 4 dimensional array, and I need to apply which.max across multiple dimensions. Its quite slow, I'd like to find ways to speed it up.
Example:
library(microbenchmark)
array4d <- array( runif(5*500*50*5 ,-1,0),
dim = c(5, 500, 50, 5) )
microbenchmark(
max_idx <- apply(array4d, c(1,2,3), which.max )
)
Any tips appreciated, thanks!
Edit: I've managed to make it marginally faster (though ugly) by directly coding it out in for loops - but I'm hoping someone out there has better ideas!
method1 <- function(z) {
apply(z, c(1,2,3), which.max)
}
method2 <- function(z){
result <- array( , dim = dim(z)[1:3] )
for(i in 1:dim(z)[1]){
for(j in 1:dim(z)[2]){
for(k in 1:dim(z)[3]){
result[i, j, k] <- which.max(z[i,j,k,])
}
}
}
return(result)
}
microbenchmark(
result1 <- method1(array4d),
result2 <- method2(array4d))
> microbenchmark(
+ result1 <- method1(array4d),
+ result2 <- method2(array4d)
+ )
Unit: milliseconds
expr min lq mean median uq max neval cld
result1 <- method1(array4d) 111.9061 140.1400 165.2441 155.6773 170.3967 384.6425 100 b
result2 <- method2(array4d) 113.4572 123.2429 136.8583 130.8505 141.9620 215.0968 100 a
Create a 3-D array and compute the maximum over each page of data (rows and columns). Starting in R2018b, to compute the maximum over all dimensions of an array, you can either specify each dimension in the vector dimension argument, or use the 'all' option.
Maximum values, returned as a scalar, vector, matrix, or multidimensional array. size(M,dim) is 1, while the sizes of all other dimensions match the size of the corresponding dimension in A, unless size(A,dim) is 0.
If we aggregate a one-dimensional array, we get a single point. If we aggregate a two-dimensional array, we get a one-dimensional array and so on. If we have more than one dimensions, we must decide which dimension (or axis) we want to eliminate. In NumPy this can be done by specifying the axis parameter in the function call.
A common technique with array problems is to have two pointers. As we need to find the maximum of j - i, let’s try to initialize the pointers at the start and the end of the array. If we already get A [i] <= A [j], we don’t even have to iterate and we can return j - i which is equals to size (A) - 1.
Adding to more methods. One using R the other tuning code from @Allan-Cameron:
method4 <- function(z){
result <- array(integer(1) , dim = head(dim(z), -1))
n <- prod(head(dim(z), -1))
j <- seq_len(tail(dim(z),1)) * n - n
for(i in seq_len(n)) result[i] <- which.max(z[i+j])
result}
Rcpp::cppFunction("
NumericVector method5(const NumericVector &input){
std::vector<int> dims = input.attr(\"dim\");
int last_dim = dims.back();
int diff = input.size()/last_dim;
std::vector<int> result(diff);
dims.pop_back();
for(int i = 0; i < diff; ++i)
{
double max_val = input[i];
int max_ind = 0;
for(int j = 0; j < last_dim; ++j)
{
if(input[i+j*diff] > max_val) {
max_val = input[i+j*diff];
max_ind = j;
}
}
result[i] = max_ind + 1;
}
NumericVector arr = wrap(result);
arr.attr(\"dim\") = dims;
return arr ;
}"
)
Timings:
set.seed(42)
array4d <- array(runif(5*500*50*5, -1, 0), dim = c(5, 500, 50, 5))
library(microbenchmark)
microbenchmark(
check = "equal", control=list(order="block")
, method1(array4d) #Using code from Question
, method2(array4d) #Using code from Question
, apply_which_max(array4d) #Using code from Allan Cameron
, method4(array4d)
, method5(array4d)
)
#Unit: microseconds
# expr min lq mean median uq max neval cld
# method1(array4d) 200857.804 228567.850 266815.6275 254530.3050 294578.1125 423838.879 100 d
# method2(array4d) 144767.680 149616.981 162367.6556 150688.1860 182290.4980 315650.052 100 c
# apply_which_max(array4d) 3131.482 3153.712 3346.1025 3175.9445 3206.2220 5922.866 100 a
# method4(array4d) 58618.275 60777.584 62334.8258 61198.1815 61702.2170 165254.042 100 b
# method5(array4d) 894.823 902.862 972.2953 911.9845 927.0885 2643.957 100 a
For random select, instead of first match:
Rcpp::cppFunction("
NumericVector method6(const NumericVector &input){
std::srand(std::time(nullptr));
std::vector<int> dims = input.attr(\"dim\");
int last_dim = dims.back();
int diff = input.size()/last_dim;
std::vector<int> result(diff);
dims.pop_back();
for(int i = 0; i < diff; ++i)
{
double max_val = input[i];
std::vector<int> max_ind = {0};
for(int j = 1; j < last_dim; ++j)
{
if(input[i+j*diff] > max_val) {
max_val = input[i+j*diff];
max_ind.clear();
max_ind.push_back(j);
} else if(input[i+j*diff] == max_val) max_ind.push_back(j);
}
result[i] = max_ind[std::rand() % max_ind.size()] + 1;
}
NumericVector arr = wrap(result);
arr.attr(\"dim\") = dims;
return arr ;
}"
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With