Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort a matrix/data.frame by all columns

Tags:

sorting

r

matrix

I have a matrix, e.g.:

a = rep(0:1, each=4)
b = rep(rep(0:1, each=2), 2)
c = rep(0:1, times=4)
mat = cbind(c,b,a)

I need to sort all columns of this matrix. I know how to do this by sorting specific columns (i.e. a limited number of columns).

mat[order(mat[,"c"],mat[,"b"],mat[,"a"]),]
     c b a
[1,] 0 0 0
[2,] 0 0 1
[3,] 0 1 0
[4,] 0 1 1
[5,] 1 0 0
[6,] 1 0 1
[7,] 1 1 0
[8,] 1 1 1

However, I need a generic way of doing this without calling any column names, because I could have any number of columns. How can I sort by a large number of columns?

like image 676
wen Avatar asked Feb 11 '23 12:02

wen


1 Answers

Here's a concise solution:

mat[do.call(order,as.data.frame(mat)),];
##      c b a
## [1,] 0 0 0
## [2,] 0 0 1
## [3,] 0 1 0
## [4,] 0 1 1
## [5,] 1 0 0
## [6,] 1 0 1
## [7,] 1 1 0
## [8,] 1 1 1

The call to as.data.frame() converts the matrix to a data.frame in the intuitive way, i.e. each matrix column becomes a list component in the new data.frame. From that, you can effectively pass each matrix column to a single invocation of order() by passing the listified form of the matrix as the second argument of do.call().

This will work for any number of columns.


It's not a dumb question. The reason that mat[order(as.data.frame(mat)),] does not work is because order() does not order data.frames by row.

Instead of returning a row order for the data.frame based on ordering the column vectors from left to right (which is what my solution does), it basically flattens the data.frame to a single big vector and orders that.

So, in fact, order(as.data.frame(mat)) is equivalent to order(mat), as a matrix is treated as a flat vector as well.

For your particular data, this returns 24 indexes, which could theoretically be used to index (as a vector) the original matrix mat, but since in the expression mat[order(as.data.frame(mat)),] you're trying to use them to index just the row dimension of mat, some of the indexes are past the highest row index, so you get a "subscript out of bounds" error.

See ?do.call.

I don't think I can explain it better than the help page; take a look at the examples, play with them until you get how it works. Basically, you need to call it when the arguments you want to pass to a single invocation of a function are trapped inside a list.

You can't pass the list itself (because then you're not passing the intended arguments, you're passing a list containing the intended arguments), so there must be a primitive function that "unwraps" the arguments from the list for the function call.

This is a common primitive in programming languages where functions are first-class objects, notably (besides R's do.call()) JavaScript's apply(), Python's (deprecated) apply(), and vim's call().

like image 111
bgoldst Avatar answered Feb 13 '23 01:02

bgoldst