I am trying to assign a ranking across a group, and apply the ranking across the whole group My data look like this:
colour <- rep(c("blue", "red"), 3)
day <- rep(c("mon", "tues", "weds"), each = 2)
value <- sample.int(100, 6, TRUE)
dat <- tibble::tibble(colour, day, value)
library(dplyr)
dat %>%
group_by(day) %>%
mutate(rank=rank(-value)) %>%
mutate(top = case_when(rank == 1~colour)) %>%
mutate(second = case_when(rank == 2~colour))
# A tibble: 6 × 6
# Groups: day [3]
colour day value rank top second
<chr> <chr> <int> <dbl> <chr> <chr>
1 blue mon 34 1 blue NA
2 red mon 16 2 NA red
3 blue tues 24 2 NA blue
4 red tues 36 1 red NA
5 blue weds 87 1 blue NA
6 red weds 12 2 NA red
What I am hoping for is that when a colour is the highest value for a particular day, then all observations of that day will display the top colour, idem for second, etc. such that:
# A tibble: 6 × 6
# Groups: day [3]
colour day value rank top second
<chr> <chr> <int> <dbl> <chr> <chr>
1 blue mon 34 1 blue red
2 red mon 16 2 blue red
3 blue tues 24 2 red blue
4 red tues 36 1 red blue
5 blue weds 87 1 blue red
6 red weds 12 2 blue red
A method specific to just top and second can be done with:
# your data
dat <- structure(list(colour = c("blue", "red", "blue", "red", "blue", "red"), day = c("mon", "mon", "tues", "tues", "weds", "weds"), value = c(34L, 16L, 24L, 36L, 87L, 12L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6"))
library(dplyr)
dat |>
mutate(
.by = day,
top = colour[which.max(value)],
second = colour[rank(value, ties.method = "first")][1]
)
# colour day value top second
# 1 blue mon 34 blue red
# 2 red mon 16 blue red
# 3 blue tues 24 red blue
# 4 red tues 36 red blue
# 5 blue weds 87 blue red
# 6 red weds 12 blue red
That demonstrates that we can use which.max(.) for the first, but we really need rank() to determine second place.
We can generalize that to n-th place with
dat |>
mutate(
.by = day,
as.data.frame(lapply(setNames(nm = 1:2),
function(p) colour[p == rank(-value, ties.method = "first")][1]),
check.names = FALSE)
)
# colour day value 1 2
# 1 blue mon 34 blue red
# 2 red mon 16 blue red
# 3 blue tues 24 red blue
# 4 red tues 36 red blue
# 5 blue weds 87 blue red
# 6 red weds 12 blue red
For the names, we can either:
|> rename(top=`1`, second=`2`) to the pipesetNames with setNames(1:2, c("top", "second"))For the record, the trailing [1] on colour[rank(.)] ensures that we always get exactly one. If a group does not have enough rows to have a second place, for instance, then without [1] it will error, because colour[rank(.) == 2] returns length-0. When adding [1] after that, when it returns length-0 then the [1] returns NA. Try setNames(nm=1:3) to the above to see what I mean, both with and without the [1].
Here's one way using pivoting. It should work for third and fourth etc. ranks too.
library(tidyr)
library(dplyr)
inner_join(dat,
dat %>%
mutate(rank=rank(-value, ties="first"), .by=day) %>%
pivot_wider(id_cols=day,
values_from=colour,
names_from=rank,
names_prefix="colour")) %>%
select(names(dat), sort(names(.)[-c(1:3)]))
# A tibble: 6 × 5
colour day value colour1 colour2
<chr> <chr> <int> <chr> <chr>
1 blue mon 68 blue red
2 red mon 39 blue red
3 blue tues 1 red blue
4 red tues 34 red blue
5 blue weds 87 blue red
6 red weds 43 blue red
Data:
set.seed(1)
colour <- rep(c("blue","red"),3)
day <- rep(c("mon", "tues", "weds"), each=2)
value <- sample.int(100,6,T)
dat <- tibble(colour,day,value)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With