Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assign top 2 ranking by group

Tags:

r

dplyr

rank

I am trying to assign a ranking across a group, and apply the ranking across the whole group My data look like this:

colour <- rep(c("blue", "red"), 3)
day <- rep(c("mon", "tues", "weds"), each = 2)
value <- sample.int(100, 6, TRUE)
dat <- tibble::tibble(colour, day, value)

library(dplyr)
dat %>%
  group_by(day) %>%
  mutate(rank=rank(-value)) %>%
  mutate(top = case_when(rank == 1~colour)) %>%
  mutate(second = case_when(rank == 2~colour))
# A tibble: 6 × 6
# Groups:   day [3]
  colour day   value  rank top   second
  <chr>  <chr> <int> <dbl> <chr> <chr> 
1 blue   mon      34     1 blue  NA    
2 red    mon      16     2 NA    red   
3 blue   tues     24     2 NA    blue  
4 red    tues     36     1 red   NA    
5 blue   weds     87     1 blue  NA    
6 red    weds     12     2 NA    red 

What I am hoping for is that when a colour is the highest value for a particular day, then all observations of that day will display the top colour, idem for second, etc. such that:

# A tibble: 6 × 6
# Groups:   day [3]
  colour day   value  rank top   second
  <chr>  <chr> <int> <dbl> <chr> <chr> 
1 blue   mon      34     1 blue  red   
2 red    mon      16     2 blue  red   
3 blue   tues     24     2 red   blue  
4 red    tues     36     1 red   blue  
5 blue   weds     87     1 blue  red   
6 red    weds     12     2 blue  red   
like image 250
Gilrob Avatar asked Oct 29 '25 17:10

Gilrob


2 Answers

A method specific to just top and second can be done with:

# your data
dat <- structure(list(colour = c("blue", "red", "blue", "red", "blue", "red"), day = c("mon", "mon", "tues", "tues", "weds", "weds"), value = c(34L, 16L, 24L, 36L, 87L, 12L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6"))

library(dplyr)
dat |>
  mutate(
    .by = day,
    top = colour[which.max(value)],
    second = colour[rank(value, ties.method = "first")][1]
  )
#   colour  day value  top second
# 1   blue  mon    34 blue    red
# 2    red  mon    16 blue    red
# 3   blue tues    24  red   blue
# 4    red tues    36  red   blue
# 5   blue weds    87 blue    red
# 6    red weds    12 blue    red

That demonstrates that we can use which.max(.) for the first, but we really need rank() to determine second place.

We can generalize that to n-th place with

dat |>
  mutate(
    .by = day,
    as.data.frame(lapply(setNames(nm = 1:2),
                         function(p) colour[p == rank(-value, ties.method = "first")][1]),
                  check.names = FALSE)
  )
#   colour  day value    1    2
# 1   blue  mon    34 blue  red
# 2    red  mon    16 blue  red
# 3   blue tues    24  red blue
# 4    red tues    36  red blue
# 5   blue weds    87 blue  red
# 6    red weds    12 blue  red

For the names, we can either:

  • add |> rename(top=`1`, second=`2`) to the pipe
  • replace that setNames with setNames(1:2, c("top", "second"))

For the record, the trailing [1] on colour[rank(.)] ensures that we always get exactly one. If a group does not have enough rows to have a second place, for instance, then without [1] it will error, because colour[rank(.) == 2] returns length-0. When adding [1] after that, when it returns length-0 then the [1] returns NA. Try setNames(nm=1:3) to the above to see what I mean, both with and without the [1].

like image 144
r2evans Avatar answered Nov 01 '25 07:11

r2evans


Here's one way using pivoting. It should work for third and fourth etc. ranks too.

library(tidyr)
library(dplyr)
inner_join(dat,
           dat %>%
  mutate(rank=rank(-value, ties="first"), .by=day) %>%
  pivot_wider(id_cols=day,
              values_from=colour,
              names_from=rank,
              names_prefix="colour"))  %>%
  select(names(dat), sort(names(.)[-c(1:3)]))


# A tibble: 6 × 5
  colour day   value colour1 colour2
  <chr>  <chr> <int> <chr>   <chr>  
1 blue   mon      68 blue    red    
2 red    mon      39 blue    red    
3 blue   tues      1 red     blue   
4 red    tues     34 red     blue   
5 blue   weds     87 blue    red    
6 red    weds     43 blue    red

Data:

set.seed(1)
colour <- rep(c("blue","red"),3)
day <- rep(c("mon", "tues", "weds"), each=2)
value <- sample.int(100,6,T)
dat <- tibble(colour,day,value)
like image 43
Edward Avatar answered Nov 01 '25 07:11

Edward



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!