When a single criterion is well ordered, the rank function returns the obvious thing:
rank(c(2,4,1,3,5))
[1] 2 4 1 3 5
When a single criterion has ties, the rank function (by default) assigns average ranks to the ties:
rank(c(2,4,1,1,5))
[1] 3.0 4.0 1.5 1.5 5.0
The rank function doesn't let you sort on multiple criteria, so you have to use something else. One way to do it is by using match and order. For a single criterion without ties the results are the same:
rank(c(2,4,1,3,5))
[1] 2 4 1 3 5
match(1:5, order(c(2,4,1,3,5)))
[1] 2 4 1 3 5
For a single criterion with ties, however, the results differ:
rank(c(2,4,1,4,5))
[1] 2.0 3.5 1.0 3.5 5.0
match(1:5, order(c(2,4,1,4,5)))
[1] 2 3 1 4 5
The ties are broken in such a way that the tied elements have their original order preserved rather than being assigned equal ranks. This feature generalizes, obviously, when you sort on multiple criteria:
match(1:5, order(c(2,4,1,4,5),c(10,11,12,11,13)))
[1] 2 3 1 4 5
Finally, the question: Is there a simple, or built-in, way of computing rank using multiple criteria that preserves ties? I've written a function to do it, but it's ugly and seems ridiculously complicated for such a basic functionality...
The ranking of a variable in an R data frame can be done by using rank function. For example, if we have a data frame df that contains column x then rank of values in x can be found as rank(df$x).
For example, if we have a vector x that contains values 1, 2, 3 in this sequence then the rank function will return 1 2 3. But if we want to get ranks from largest to smallest then it would be 3 2 1 and it can be done in R as rank(-x).
interaction
does what you need:
> rank(interaction(c(2,4,1,4,5),c(10,11,12,11,13), lex.order=TRUE))
[1] 2.0 3.5 1.0 3.5 5.0
Here is what is happening.
interaction
expects factors, so the vectors are coerced. Doing so produces the order in the factor levels as indicated by sort.list
, which for numeric
is numerically nondecreasing order.
Then to combine the two factors, the interaction creates factor levels by varying the second argument fastest (because lex.order=TRUE
). Thus ties in the first vector are resolved by the value in the second vector (if possible).
Finally, rank
coerces the resulting factor to numeric
.
What is actually ranked:
> as.numeric(interaction(c(2,4,1,4,5),c(10,11,12,11,13), lex.order=TRUE))
[1] 5 10 3 10 16
You will save some memory if you supply the option drop=TRUE
to interaction
. This will change the ranked numeric values, but not their order, so the final result is the same.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With