When there are ties in the original data, is there a way to create a ranking without gaps in the ranks (consecutive, integer rank values)? Suppose:
x <- c(10, 10, 10, 5, 5, 20, 20)
rank(x)
# [1] 4.0 4.0 4.0 1.5 1.5 6.5 6.5
In this case the desired result would be:
my_rank(x)
[1] 2 2 2 1 1 3 3
I've played with all the options for ties.method
option (average
, max
, min
, random
), none of which are designed to provide the desired result.
Is it possible to acheive this with the rank()
function?
Tied observations are given the average of the ranks they would have received as if no ranks were tied. For example, if three teams are tied for first, they are also statistically tied for second and for third as well.
There is a formula to quickly rank values based on group. Select a blank cell next to the data, C2 for instance, type this formula, =SUMPRODUCT(($A$2:$A$11=A2)*(B2<$B$2:$B$11))+1 then drag autofill handle down to apply this formula to the cells you need.
The formula is =RANK(B4,$B$4:$B$13) . Notice that the formula does not return a 3 or a 9 because there are two sets of ties among the scores: Tom and Sophia both scored 245, while Mike and Nick both scored 138.
To rank list data without ties, you only need a formula. Select a blank cell that will place the ranking, type this formula =RANK($B2,$B$2:$B$9)+COUNTIF(B$2:B2,B2)-1, press Enter key, and drag the fill handle down to apply this formula.
Modified crayola solution but using match
instead of merge
:
x_unique <- unique(x)
x_ranks <- rank(x_unique)
x_ranks[match(x,x_unique)]
edit
or in a one-liner, as per @hadley 's comment:
match(x, sort(unique(x)))
The "loopless" way to do it is to simply treat the vector as an ordered factor, then convert it to numeric:
> as.numeric( ordered( c( 10,10,10,10, 5,5,5, 10, 10 ) ) )
[1] 2 2 2 2 1 1 1 2 2
> as.numeric( ordered( c(0.5,0.56,0.76,0.23,0.33,0.4) ))
[1] 4 5 6 1 2 3
> as.numeric( ordered( c(1,1,2,3,4,5,8,8) ))
[1] 1 1 2 3 4 5 6 6
Update: Another way, that seems faster is to use findInterval
and sort(unique())
:
> x <- c( 10, 10, 10, 10, 5,5,5, 10, 10)
> findInterval( x, sort(unique(x)))
[1] 2 2 2 2 1 1 1 2 2
> x <- round( abs( rnorm(1000000)*10))
> system.time( z <- as.numeric( ordered( x )))
user system elapsed
0.996 0.025 1.021
> system.time( z <- findInterval( x, sort(unique(x))))
user system elapsed
0.077 0.003 0.080
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With