Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reorder 1st dim of array in R (don't know total dims)

I have an array whose first dimension I need to subset/ index/ reorder. For example:

arr <- array(1:24, dim=c(4,3,2))
arr[4:1,,]

Simple, works like a charm.

However, is there a way to do this when I'm unsure how many dimensions the array has? To be clear, I will always know the size of that first dimension (i.e., I know dim(arr)[1]), I just don't know length(dim(arr)).

like image 253
rbatt Avatar asked Feb 09 '23 07:02

rbatt


1 Answers

Here's a weird alternative. This idea is based on an implementation quirk I noticed at one point, that R seems to represent "missing" function arguments as symbols with zero-length names. One of the reasons that this is so bizarre is that R normally doesn't allow you to create symbols with zero-length names:

as.symbol('');
## Error in as.symbol("") : attempt to use zero-length variable name

But through some messing around, I discovered that you can slip past R's defenses by accessing the parse tree of an expression that involves a "missing" argument, and indexing out the element of the parse tree that contains the "missing" argument. Here's a demo of some of the weird behavior you get from this thing:

substitute(x[]); ## parse tree involving missing argument
## x[]
as.list(substitute(x[])); ## show list representation; third component is the guy
## [[1]]
## `[`
##
## [[2]]
## x
##
## [[3]]
##
##
substitute(x[])[[3]]; ## prints nothing!
##
(function(x) c(typeof(x),mode(x),class(x)))(substitute(x[])[[3]]); ## it's a symbol alright
## [1] "symbol" "name"   "name"
as.character(substitute(x[])[[3]]); ## gets the name of the symbol: the empty string!
## [1] ""
i.dont.exist <- substitute(x[])[[3]]; ## store in variable
i.dont.exist; ## wha??
## Error: argument "i.dont.exist" is missing, with no default

Anyway, here's the solution we can derive for the OP's problem:

arr <- array(1:24,4:2);
do.call(`[`,c(list(arr,4:1),rep(list(substitute(x[])[[3]]),length(dim(arr))-1)));
## , , 1
##
##      [,1] [,2] [,3]
## [1,]    4    8   12
## [2,]    3    7   11
## [3,]    2    6   10
## [4,]    1    5    9
##
## , , 2
##
##      [,1] [,2] [,3]
## [1,]   16   20   24
## [2,]   15   19   23
## [3,]   14   18   22
## [4,]   13   17   21
##

I was hoping it would outperform all the other solutions, but @thelatemail, you win this round: Aha! I realized that we can precompute a list of an empty symbol (storing an empty symbol in a variable by itself, i.e. not in a list, is not usable, as I showed above) and rep() that list in the solution, rather than incurring all the overhead of substitute() to parse out a dummy expression in every invocation of the solution. And behold the performance:

straight <- function() arr[4:1,,];
jb <- function() do.call(`[`,c(list(arr,4:1),lapply(dim(arr)[-1],seq_len)));
tlm <- function() do.call(`[`,c(list(arr,4:1),rep(TRUE,length(dim(arr))-1)));
orderD1 <- function(x,ord) { dims <- dim(x); ndim <- length(dims); stopifnot(ndim>0); if (ndim==1) return(x[ord]); wl_i <- which(letters=="i"); dimLetters <- letters[wl_i:(wl_i+ndim-1)]; dimList <- structure(vector("list",ndim),.Names=dimLetters); dimList[[1]] <- ord; for (i in 2:ndim) dimList[[i]] <- 1:dims[i]; do.call("[",c(list(x=x),dimList)); };
rbatt <- function() orderD1(arr,4:1);
bgoldst <- function() do.call(`[`,c(list(arr,4:1),rep(list(substitute(x[])[[3]]),length(dim(arr))-1)));
ls0 <- list(substitute(x[])[[3]]);
ls0;
## [[1]]
##
##
bgoldst2 <- function() do.call(`[`,c(list(arr,4:1),rep(ls0,length(dim(arr))-1)));

microbenchmark(straight(),jb(),tlm(),rbatt(),bgoldst(),bgoldst2(),times=1e5);
## Unit: nanoseconds
##        expr   min    lq      mean median    uq      max neval
##  straight()   428   856  1161.038    856  1284   998142 1e+05
##        jb()  4277  5988  7136.534   6843  7271  1629357 1e+05
##       tlm()  2566  3850  4622.668   4277  4705  1704196 1e+05
##     rbatt() 24804 28226 31975.583  29509 31219 34970873 1e+05
##   bgoldst()  3421  4705  5601.300   5132  5560  1918878 1e+05
##  bgoldst2()  2566  3850  4533.383   4277  4705  1034065 1e+05

Just discovered that there's an easier way to get ahold of the empty symbol, which seems to have been available all along:

substitute();
##

My substitute(x[])[[3]] trick now looks kind of stupid.

Out of curiosity I benchmarked using substitute() directly against the other solutions, and it incurs a slight performance cost compared to bgoldst2(), making it slightly worse than tlm():

bgoldst3 <- function() do.call(`[`,c(list(arr,4:1),rep(list(substitute()),length(dim(arr))-1)));
microbenchmark(straight(),jb(),tlm(),rbatt(),bgoldst(),bgoldst2(),bgoldst3(),times=1e5);
## Unit: nanoseconds
##        expr   min    lq      mean median    uq      max neval
##  straight()   428   856  1069.340    856  1284   850603 1e+05
##        jb()  4277  5988  6916.899   6416  7270  2978180 1e+05
##       tlm()  2566  3849  4307.979   4277  4704  3138122 1e+05
##     rbatt() 24377 28226 30882.666  29508 30364 36768360 1e+05
##   bgoldst()  2994  4704  5165.019   5132  5560  2050171 1e+05
##  bgoldst2()  2566  3849  4232.816   4277  4278  1085813 1e+05
##  bgoldst3()  2566  3850  4545.508   4277  4705  1004131 1e+05
like image 163
bgoldst Avatar answered Feb 11 '23 22:02

bgoldst