Multi-group rankings using data table package

Question

I have a data frame containing a group identifier and a score for an observation from that group for an activity. Both group letters and scores can repeat.

library(tidyverse)
library(data.table)

set.seed(1)

dat <- data.frame(rowid = 1:1000, 
                  grp = sample(LETTERS[1:20], 1000, replace = T),
                  score = sample(1:5, 1000, replace = T))

For each group, I need to rank their distinct scores in descending order; for each unique group and score combination, I would like to retain a single row. Ideally, this row would be the row with the smallest rowid, but this is not absolutely necessary. If it is more efficient to randomly select a row from each group and score combination, that works just as well.

dat %>% 
  group_by(grp) %>% 
  mutate(score_rank = dense_rank(desc(score))) %>% 
  arrange(grp, score_rank, rowid) %>% ## this step not completely necessary (see above)
  group_by(grp, score_rank) %>% 
  mutate(score_rank_row = row_number()) %>% 
  filter(score_rank_row == 1) %>% 
  arrange(grp, -score)

The code above accomplishes what I want to do, but in reality, my actual data frame has 20m+ rows, has 5,000 values for group, and each group can have hundreds of scores, many of which repeat. As a result, when I try to use the solution presented above for my actual use case, my computer keeps running and running without finishing the operation.

My understanding is that data.table is much more efficient at handling these types of tasks, but I cannot for the life of me figure out the syntax to use it. Can anyone help me out?

akrun · Accepted Answer

With data.table, we can use

library(data.table)
setDT(dat)[, score_rank := frank(-score, ties.method =  'dense'), grp]
setkey(dat, grp, score_rank, rowid)
dat[, score_rank_row := rowid(grp, score_rank)][
           score_rank_row == 1][order(grp, -score)]

Multi-group rankings using data table package

Tags:

r

data.table

djc55

1 Answers

akrun

Recent Activity

Donate For Us

Multi-group rankings using data table package

Tags:

r

data.table

djc55

1 Answers

akrun

Related questions

Recent Activity

Donate For Us