Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a simple way to rank on multiple criteria that preserves ties in R?

Tags:

When a single criterion is well ordered, the rank function returns the obvious thing:

rank(c(2,4,1,3,5))
[1] 2 4 1 3 5

When a single criterion has ties, the rank function (by default) assigns average ranks to the ties:

rank(c(2,4,1,1,5))
[1] 3.0 4.0 1.5 1.5 5.0

The rank function doesn't let you sort on multiple criteria, so you have to use something else. One way to do it is by using match and order. For a single criterion without ties the results are the same:

rank(c(2,4,1,3,5))
[1] 2 4 1 3 5

match(1:5, order(c(2,4,1,3,5)))
[1] 2 4 1 3 5

For a single criterion with ties, however, the results differ:

rank(c(2,4,1,4,5))
[1] 2.0 3.5 1.0 3.5 5.0

match(1:5, order(c(2,4,1,4,5)))
[1] 2 3 1 4 5

The ties are broken in such a way that the tied elements have their original order preserved rather than being assigned equal ranks. This feature generalizes, obviously, when you sort on multiple criteria:

match(1:5, order(c(2,4,1,4,5),c(10,11,12,11,13)))
[1] 2 3 1 4 5

Finally, the question: Is there a simple, or built-in, way of computing rank using multiple criteria that preserves ties? I've written a function to do it, but it's ugly and seems ridiculously complicated for such a basic functionality...

like image 205
user1939887 Avatar asked Dec 31 '12 16:12

user1939887


People also ask

How do you rank results in R?

The ranking of a variable in an R data frame can be done by using rank function. For example, if we have a data frame df that contains column x then rank of values in x can be found as rank(df$x).

How do you rank from smallest to largest in R?

For example, if we have a vector x that contains values 1, 2, 3 in this sequence then the rank function will return 1 2 3. But if we want to get ranks from largest to smallest then it would be 3 2 1 and it can be done in R as rank(-x).


1 Answers

interaction does what you need:

> rank(interaction(c(2,4,1,4,5),c(10,11,12,11,13), lex.order=TRUE))
[1] 2.0 3.5 1.0 3.5 5.0

Here is what is happening.

interaction expects factors, so the vectors are coerced. Doing so produces the order in the factor levels as indicated by sort.list, which for numeric is numerically nondecreasing order.
Then to combine the two factors, the interaction creates factor levels by varying the second argument fastest (because lex.order=TRUE). Thus ties in the first vector are resolved by the value in the second vector (if possible).
Finally, rank coerces the resulting factor to numeric.

What is actually ranked:

> as.numeric(interaction(c(2,4,1,4,5),c(10,11,12,11,13), lex.order=TRUE))
[1]  5 10  3 10 16

You will save some memory if you supply the option drop=TRUE to interaction. This will change the ranked numeric values, but not their order, so the final result is the same.

like image 154
Matthew Lundberg Avatar answered Sep 18 '22 18:09

Matthew Lundberg