Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: apply() type function for two 2-d arrays

I'm trying to find an apply() type function that can run a function that operates on two arrays instead of one.

Sort of like:

apply(X1 = doy_stack, X2 = snow_stack, MARGIN = 2, FUN = r_part(a, b))

The data is a stack of band arrays from Landsat tiles that are stacked together using rbind. Each row contains the data from a single tile, and in the end, I need to apply a function on each column (pixel) of data in this stack. One such stack contains whether each pixel has snow on it or not, and the other stack contains the day of year for that row. I want to run a classifier (rpart) on each pixel and have it identify the snow free day of year for each pixel.

What I'm doing now is pretty silly: mapply(paste, doy, snow_free) concatenates the day of year and the snow status together for each pixel as a string, apply(strstack, 2, FUN) runs the classifer on each pixel, and inside the apply function, I'm exploding each string using strsplit. As you might imagine, this is pretty inefficient, especially on 1 million pixels x 300 tiles.

Thanks!

like image 829
cswingle Avatar asked Feb 22 '11 19:02

cswingle


2 Answers

I wouldn't try to get too fancy. A for loop might be all you need.

out <- numeric(n)
for(i in 1:n) {
  out[i] <- snow_free(doy_stack[,i], snow_stack[,i])
}

Or, if you don't want to do the bookkeeping yourself,

sapply(1:n, function(i) snow_free(doy_stack[,i], snow_stack[,i]))
like image 167
Aaron left Stack Overflow Avatar answered Oct 24 '22 16:10

Aaron left Stack Overflow


I've just encountered the same problem and, if I clearly understood the question, I may have solved it using mapply.

We'll use two 10x10 matrices populated with uniform random values.

set.seed(1)
X <- matrix(runif(100), 10, 10)
set.seed(2)
Y <- matrix(runif(100), 10, 10)

Next, determine how operations between the matrices will be performed. If it is row-wise, you need to transpose X and Y then cast to data.frame. This is because a data.frame is a list with columns as list elements. mapply() assumes that you are passing a list. In this example I'll perform correlation row-wise.

res.row <- mapply(function(x, y){cor(x, y)}, as.data.frame(t(X)), as.data.frame(t(Y)))
res.row[1]
     V1 
0.36788

should be the same as

cor(X[1,], Y[1,])
[1] 0.36788

For column-wise operations exclude the t():

res.col <- mapply(function(x, y){cor(x, y)}, as.data.frame(X), as.data.frame(Y))

This obviously assumes that X and Y have dimensions consistent with the operation of interest (i.e. they don't have to be exactly the same dimensions). For instance, one could require a statistical test row-wise but having differing numbers of columns in each matrix.

like image 43
polarise Avatar answered Oct 24 '22 16:10

polarise