Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vectorizing order in R

Tags:

r

I am trying to order each row in a matrix with few columns and many rows. Is there a vectorized version of this in R? More concretely, let's set our seed to 10 and make an example matrix:

set.seed(10)
example.matrix = replicate(12,runif(500000))

To order example.matrix, I would,

ordered.example = apply(example.matrix,1,order)

But that is very slow and I would love something faster. As an analogy,

rowSums(example.matrix)

Is preferable to,

apply(example.matrix,1,sum)

Much appreciated.

like image 413
Yaniv Brandvain Avatar asked Apr 18 '13 19:04

Yaniv Brandvain


People also ask

How do you order values in R?

To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.

How do I show ascending order in R?

order() in R The numbers are ordered according to its index by using order(x) . Here the order() will sort the given numbers according to its index in the ascending order.

How do I arrange a vector in increasing order in R?

sort() function in R is used to sort a vector. By default, it sorts a vector in increasing order. To sort in descending order, add a “decreasing” parameter to the sort function.

What is order function R?

Definition of order() R function: The order function returns the position of each element of its input in ascending or descending order. As you can see in Figure 1, the lowest value (i.e. -10) is located at position two and the highest value (i.e. 8) is located at position three within our example vector.


1 Answers

Here's a way of speeding it up 10x. It's specifically tailored to your example and depending on what your real data is like this method may or may not work.

The idea is to add 0 to first row, 1 to second and so on, then collapse it to a vector, sort that and then recombine into a matrix:

N = 12; M = 500000; d = replicate(N,runif(M))

system.time(d1<-t(apply(d, 1, order)))
#   user  system elapsed 
#  11.26    0.06   11.34 

system.time(d2<-matrix(order(as.vector(t(matrix(as.vector(d) + 0:(M-1), nrow = M)))) -
                       rep(0:(M-1), each = N)*N, nrow = M, byrow = T))
#   user  system elapsed 
#   1.39    0.14    1.53 

# Note: for some reason identical() fails, but the two are in fact the same
sum(abs(d1-d2))
# 0
like image 115
eddi Avatar answered Sep 20 '22 18:09

eddi