Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected apply function behaviour in R

Tags:

r

apply

I've discovered a surprising behaviour by apply that I wonder if anyone can explain. Lets take a simple matrix:

> (m = matrix(1:8,ncol=4))
     [,1] [,2] [,3] [,4]
[1,]    1    3    5    7
[2,]    2    4    6    8

We can flip it vertically thus:

> apply(m, MARGIN=2, rev)
     [,1] [,2] [,3] [,4]
[1,]    2    4    6    8
[2,]    1    3    5    7

This applies the rev() vector reversal function iteratively to each column. But when we try to apply rev by row we get:

> apply(m, MARGIN=1, rev)
     [,1] [,2]
[1,]    7    8
[2,]    5    6
[3,]    3    4
[4,]    1    2

.. a 90 degree anti-clockwise rotation! Apply delivers the same result using FUN=function(v) {v[length(v):1]} so it is definitely not rev's fault.

Any explanation for this?

like image 844
geotheory Avatar asked Feb 12 '14 13:02

geotheory


2 Answers

This is because apply returns a matrix that is defined column-wise, and you're iterating over the rows.

The first application of apply presents each row, which is then a column in the result.

Presenting the function print shows what's being passed to rev at each iteration:

 x <- apply(m, 1, print)
[1] 1 3 5 7
[1] 2 4 6 8

That is, each call to print is passed a vector. Two calls, and c(1,3,5,7) and c(2,4,6,8) are being passed to the function.

Reversing these gives c(7,5,3,1) and c(8,6,4,2), then these are used as the columns of the return matrix, giving the result that you see.

like image 50
Matthew Lundberg Avatar answered Oct 17 '22 14:10

Matthew Lundberg


The documentation states that

If each call to FUN returns a vector of length n, then apply returns an array of dimension c(n, dim(X)[MARGIN]) if n > 1.

From that perspective, this behaviour is not a bug whatsoever, that's how it intended to work.

One may wonder why this is chosen to be a default setting, instead of preserving the structure of the original matrix. Consider the following example:

> apply(m, 1, quantile)
     [,1] [,2]
0%    1.0  2.0
25%   2.5  3.5
50%   4.0  5.0
75%   5.5  6.5
100%  7.0  8.0

> apply(m, 2, quantile)
     [,1] [,2] [,3] [,4]
0%   1.00 3.00 5.00 7.00
25%  1.25 3.25 5.25 7.25
50%  1.50 3.50 5.50 7.50
75%  1.75 3.75 5.75 7.75
100% 2.00 4.00 6.00 8.00

> all(rownames(apply(m, 2, quantile)) == rownames(apply(m, 1, quantile)))
[1] TRUE

Consistent? Indeed, why would we expect anything else?

like image 2
tonytonov Avatar answered Oct 17 '22 13:10

tonytonov