I have some data:
test <- data.frame(A=c("aaabbb",
"aaaabb",
"aaaabb",
"aaaaab",
"bbbaaa")
)
and so on. All the elements are the same length, and are already sorted before I get them.
I need to make a new column of ranks, "First", "Second", "Third", anything after that can be left blank, and it needs to account for ties. So in the above case, I'd like to get the following output:
A B
aaabbb First
aaaabb Second
aaaabb Second
aaaaab Third
bbbaaa
bbbbaa
I looked at rank() and some other posts that used it, but I wasn't able to get it to do what I was looking for.
Single column rank With the original table already in Power Query, select the Total Points column. Then from the Power Query Add column tab, select Rank column. In Rank, Rank by will be the field selected ( Total Points ) and the Rank criteria will be Higher value ranks higher.
The ranking of a variable in an R data frame can be done by using rank function. For example, if we have a data frame df that contains column x then rank of values in x can be found as rank(df$x).
To rank multiple references, you only need a formula. Select a blank cell which you will place the ranking result, enter this formula =1+SUMPRODUCT(($A$2:$A$12=A2)*($B$2:$B$12>B2)), press Enter key, and drag fill handle over the cells to apply this formula.
How about this:
test$B <- match(test$A , unique(test$A)[1:3] )
test
A B
1 aaabbb 1
2 aaaabb 2
3 aaaabb 2
4 aaaaab 3
5 bbbaaa NA
6 bbbbaa NA
One of many ways to do this. Possibly not the best, but one that readily springs to mind and is fairly intuitive. You can use unique
because you receive the data pre-sorted.
As data is sorted another suitable function worth considering is rle
, although it's slightly more obtuse in this example:
rnk <- rle(as.integer(df$A))$lengths
rnk
# [1] 1 2 1 1 1
test$B <- c( rep( 1:3 , times = rnk[1:3] ) , rep(NA, sum( rnk[-c(1:3)] ) ) )
rle
computes the lengths (and values which we don't really care about here) of runs of equal values in a vector - so again this works because your data are already sorted.
And if you don't have to have blanks after the third ranked item it's even simpler (and more readable):
test$B <- rep(1:length(rnk),times=rnk)
This seems like a good application for factors:
test$B <- as.numeric(factor(test$A, levels = unique(test$A)))
cumsum
also comes to mind, where we add 1
every time the value changes:
test$B <- cumsum(c(TRUE, tail(test$A, -1) != head(test$A, -1)))
(Like @Simon said, there are many ways to do this...)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With