Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply a function to a multi-dimensional array: R vs MATLAB

Tags:

arrays

r

matlab

This question can be considered related to this one, that helped me to improve the R performances in computing the mean on a big array. Unfortunately, in this case I'm trying to apply something more complex (like a quantile calculation).

I have a 4-D array with more than 40 millions of elements and I want to calculate the 66th percentile on a specific dimension. Here there is the MATLAB code:

> n = randn(100, 50, 100, 20);
> tic; q = quantile(n, 0.66, 4); toc
Elapsed time is 0.440824 seconds.

Let's do something similar in R.

> n = array(rnorm(100*50*100*20), dim = c(100,50,100,20))
> start = Sys.time(); q = apply(n, 1:3, quantile, .66); print(Sys.time() - start)
Time difference of 1.600693 mins

I was aware of the better performances of MATLAB wrt R but in this case I don't know what to do. Probably I just need to wait 2 minutes instead of one second... I hope someone can suggest me any way to improve running times, anyway, thank you in advance...

UPDATE I've applied some of the suggestions into the comments and I've reduced the running time:

> start = Sys.time(); q = apply(n, 1:3, quantile, .66, names = FALSE); print(Sys.time() - start)
Time difference of 33.42773 secs

We're still far from the MATLAB performances but at least I've learnt something.

UPDATE I put here some advancements related to `quantile' function discussed here. The running time of same code I've shown above has passed from 33 to 5 seconds...

like image 414
Matteo De Felice Avatar asked Apr 02 '14 07:04

Matteo De Felice


1 Answers

The RcppOctave package calls the GNU Octave API functions; if you don't already know about GNU Octave, it is very similar to Matlab and aims for complete compatiility.

This is nearly as fast as Matlab direct...

library(RcppOctave)

set.seed(1)
n = array(rnorm(100*50*100*20), dim = c(100,50,100,20))

system.time( s <- octave_style_quantile(n, .66, 4) )
##    user  system elapsed 
##   0.526   0.048   0.574

# *R* `quantile` argument `type=5` is the method that matches matlab.
system.time( r <- apply(n, 1:3, quantile, .66, names = FALSE, type=5) )
##    user  system elapsed 
##  23.308   0.029  23.327

dim(r)
## [1] 100  50 100

identical(r,s)
## [1] TRUE

A fairly straight forward translation of Octave's quantile.m to R.

octave_style_quantile <- function (x, p=NULL, dim=NULL) {
  if ( is.null(p) ) p <- c(0.00, 0.25, 0.50, 0.75, 1.00)

  if ( is.null(dim) ) {
    ## Find the first non-singleton dimension.
    dim <- which(dim(x) > 1)[1];
  }

  stopifnot( is.numeric(x)||is.logical(x),
             is.numeric(p),
             dim <= length(dim(x)) )

  ## Set the permutation vector.
  perm <- seq_along(dim(x))
  perm[1] <- dim
  perm[dim] <- 1

  ## Permute dim to the 1st index.
  x <- aperm(x, perm);

  ## Save the size of the permuted x N-d array.
  sx = dim(x);

  ## Reshape to a 2-d array.
  dim(x) <- c( sx[1], prod(sx[-1]) );

  ## Calculate the quantiles.
  q = .CallOctave("quantile",x,p)

  ## Return the shape to the original N-d array.
  dim(q) <- c( length(p), sx[-1] )

  ## Permute the 1st index back to dim.
  q = aperm(q, perm);
  if( any(dim(q)==1) ) dim(q) <- dim(q)[-which(dim(q)==1)]
  q
}
like image 63
Thell Avatar answered Oct 19 '22 23:10

Thell