Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add a column of ranks

Tags:

r

ranking

I have some data:

test <- data.frame(A=c("aaabbb",
"aaaabb",
"aaaabb",
"aaaaab",
"bbbaaa")
)

and so on. All the elements are the same length, and are already sorted before I get them.

I need to make a new column of ranks, "First", "Second", "Third", anything after that can be left blank, and it needs to account for ties. So in the above case, I'd like to get the following output:

   A       B
 aaabbb  First
 aaaabb  Second
 aaaabb  Second
 aaaaab  Third
 bbbaaa
 bbbbaa  

I looked at rank() and some other posts that used it, but I wasn't able to get it to do what I was looking for.

like image 622
pak Avatar asked Jun 13 '13 22:06

pak


People also ask

How do you add a column rank?

Single column rank With the original table already in Power Query, select the Total Points column. Then from the Power Query Add column tab, select Rank column. In Rank, Rank by will be the field selected ( Total Points ) and the Rank criteria will be Higher value ranks higher.

How do I add a column rank in R?

The ranking of a variable in an R data frame can be done by using rank function. For example, if we have a data frame df that contains column x then rank of values in x can be found as rank(df$x).

How do you combine ranks in Excel?

To rank multiple references, you only need a formula. Select a blank cell which you will place the ranking result, enter this formula =1+SUMPRODUCT(($A$2:$A$12=A2)*($B$2:$B$12>B2)), press Enter key, and drag fill handle over the cells to apply this formula.


2 Answers

How about this:

test$B <- match(test$A , unique(test$A)[1:3] )
test
       A  B
1 aaabbb  1
2 aaaabb  2
3 aaaabb  2
4 aaaaab  3
5 bbbaaa NA
6 bbbbaa NA

One of many ways to do this. Possibly not the best, but one that readily springs to mind and is fairly intuitive. You can use unique because you receive the data pre-sorted.

As data is sorted another suitable function worth considering is rle, although it's slightly more obtuse in this example:

rnk <- rle(as.integer(df$A))$lengths
rnk
# [1] 1 2 1 1 1
test$B <- c( rep( 1:3 , times = rnk[1:3] ) , rep(NA, sum( rnk[-c(1:3)] ) ) )

rle computes the lengths (and values which we don't really care about here) of runs of equal values in a vector - so again this works because your data are already sorted.

And if you don't have to have blanks after the third ranked item it's even simpler (and more readable):

test$B <- rep(1:length(rnk),times=rnk)
like image 165
Simon O'Hanlon Avatar answered Sep 27 '22 00:09

Simon O'Hanlon


This seems like a good application for factors:

test$B <- as.numeric(factor(test$A, levels = unique(test$A)))

cumsum also comes to mind, where we add 1 every time the value changes:

test$B <- cumsum(c(TRUE, tail(test$A, -1) != head(test$A, -1)))

(Like @Simon said, there are many ways to do this...)

like image 37
flodel Avatar answered Sep 23 '22 00:09

flodel