Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Top "n" rows of each group using dplyr -- with different number per group

I'll use the built-in chickwts data as an example.

Here's the data, there are 5 feed types.

> head(chickwts)

  weight      feed
1    179 horsebean
2    160 horsebean
3    136 horsebean
4    227 horsebean
5    217 horsebean
6    168 horsebean

> table(chickwts$feed)

   casein horsebean   linseed  meatmeal   soybean sunflower 
       12        10        12        11        14        12 

What I want is the top rows by weight for every feed type. However, I need a different number for each feed type? For example,

top_n_feed <-
  c(
    "casein" = 3,
    "horsebean" = 5,
    "linseed" = 3,
    "meatmeal" = 6,
    "soybean" = 3,
    "sunflower" = 2
  )

How can I do this using dplyr?

To get the top n rows of each feed type by weight I can use code as below, but I'm not sure how to extend this to a different number for each feed type.

chickwts %>%
  group_by(feed) %>% 
  slice_max(order_by = weight, n = 5)
like image 860
max Avatar asked Dec 30 '22 18:12

max


1 Answers

This isn't really something that dplyr names easy. I'd recommend merging in the data and then filtering.


tibble(feed=names(top_n_feed), topn=top_n_feed) %>% 
  inner_join(chickwts) %>% 
  group_by(feed) %>% 
  arrange(desc(weight), .by_group=TRUE) %>% 
  filter(row_number() <= topn) %>%
  select(-topn)

like image 171
MrFlick Avatar answered Jan 14 '23 11:01

MrFlick