Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R-ranking values of a column by grouping, conditional to another variable

Tags:

r

I have such a data frame(df):

group col1 col2  
x      a    22    
x      a    23  
x      b    16  
x      b    18  
y      a    11  
y      a    12  
y      a    16  
y      a    45  
y      b    24  

Desired output is:

group col1 col2 rank 
x      a    22  1  
x      a    23  2
x      b    16  0
x      b    18  0
y      a    11  1
y      a    12  2
y      a    16  3
y      a    45  4
y      b    24  0

Namely,

  • order col2 by group and col1
  • when col1="b" then rank is 0
  • rank values of col2 from smallest to largest

How can I do that by using R? I will be very glad for any help. Thanks a lot.

like image 675
oercim Avatar asked Mar 15 '15 12:03

oercim


People also ask

How do you rank a column in R?

The ranking of a variable in an R data frame can be done by using rank function. For example, if we have a data frame df that contains column x then rank of values in x can be found as rank(df$x).

How do you find the top 5 values in R?

To get the top values in an R data frame, we can use the head function and if we want the values in decreasing order then sort function will be required. Therefore, we need to use the combination of head and sort function to find the top values in decreasing order.

How do I rank multiple columns in R?

rank in R based on multiple columns Meanwhile, the easiest way to create rank based on multiple columns is by using the frank function from data. table package. The trickiest part is to set descending order. If it is necessary to get descending order in the frank function, add a minus sign to the corresponding column.

What is rank () in R?

rank() function in R Language is used to return the sample ranks of the values of a vector. Equal values and missing values are handled in multiple ways. Syntax: rank(x, na.last) Parameters: x: numeric, complex, character, and logical vector.


2 Answers

You could try

library(dplyr)
 df %>%
    group_by(group, col1) %>% 
    mutate(rank=replace(min_rank(col2), col1=='b',0) )
#    group col1 col2 rank
#1     x    a   22    1
#2     x    a   23    2
#3     x    b   16    0
#4     x    b   18    0
#5     y    a   11    1
#6     y    a   12    2
#7     y    a   16    3
#8     y    a   45    4
#9     y    b   24    0

If you don't want gaps between ranks when there are ties, replace min_rank with dense_rank

Or, instead of replace

 res <- df %>% 
          group_by(group, col1) %>% 
          mutate(rank=(col1!='b')*min_rank(col2))

 as.data.frame(res) #would be `data.frame`
 #    group col1 col2 rank
 #1     x    a   22    1
 #2     x    a   23    2
 #3     x    b   16    0
 #4     x    b   18    0
 #5     y    a   11    1
 #6     y    a   12    2
 #7     y    a   16    3
 #8     y    a   45    4
 #9     y    b   24    0
like image 132
akrun Avatar answered Oct 19 '22 18:10

akrun


Or using data.table v>= 1.9.5

library(data.table)
setDT(df)[, rank := frank(col2, ties.method = "dense"),
             by = .(group, col1)][col1 == "b", rank := 0L][]

#    group col1 col2 rank
# 1:     x    a   22    1
# 2:     x    a   23    2
# 3:     x    b   16    0
# 4:     x    b   18    0
# 5:     y    a   11    1
# 6:     y    a   12    2
# 7:     y    a   16    3
# 8:     y    a   45    4
# 9:     y    b   24    0

Or like @Arun suggested, you can skip one grouping step if you will set b to zero first

dt[, rank := 0L][col1 != "b", rank := frank(col2, ties.method="dense"), by=group][]
like image 37
David Arenburg Avatar answered Oct 19 '22 17:10

David Arenburg