Getting the top values by group

Tags:

Here's a sample data frame:

d <- data.frame(   x   = runif(90),   grp = gl(3, 30) )

I want the subset of d containing the rows with the top 5 values of x for each value of grp.

Using base-R, my approach would be something like:

ordered <- d[order(d$x, decreasing = TRUE), ]     splits <- split(ordered, ordered$grp) heads <- lapply(splits, head) do.call(rbind, heads) ##              x grp ## 1.19 0.8879631   1 ## 1.4  0.8844818   1 ## 1.12 0.8596197   1 ## 1.26 0.8481809   1 ## 1.18 0.8461516   1 ## 1.29 0.8317092   1 ## 2.31 0.9751049   2 ## 2.34 0.9269764   2 ## 2.57 0.8964114   2 ## 2.58 0.8896466   2 ## 2.45 0.8888834   2 ## 2.35 0.8706823   2 ## 3.74 0.9884852   3 ## 3.73 0.9837653   3 ## 3.83 0.9375398   3 ## 3.64 0.9229036   3 ## 3.69 0.8021373   3 ## 3.86 0.7418946   3

Using dplyr, I expected this to work:

d %>%   arrange_(~ desc(x)) %>%   group_by_(~ grp) %>%   head(n = 5)

but it only returns the overall top 5 rows.

Swapping head for top_n returns the whole of d.

d %>%   arrange_(~ desc(x)) %>%   group_by_(~ grp) %>%   top_n(n = 5)

How do I get the correct subset?

846

asked Jan 04 '15 13:01

Richie Cotton

1 Answers

From dplyr 1.0.0, "slice_min() and slice_max() select the rows with the minimum or maximum values of a variable, taking over from the confusing top_n()."

d %>% group_by(grp) %>% slice_max(order_by = x, n = 5) # # A tibble: 15 x 2 # # Groups:   grp [3] #     x grp   # <dbl> <fct> #  1 0.994 1     #  2 0.957 1     #  3 0.955 1     #  4 0.940 1     #  5 0.900 1     #  6 0.963 2     #  7 0.902 2     #  8 0.895 2     #  9 0.858 2     # 10 0.799 2     # 11 0.985 3     # 12 0.893 3     # 13 0.886 3     # 14 0.815 3     # 15 0.812 3

Pre-dplyr 1.0.0 using top_n:

From ?top_n, about the wt argument:

The variable to use for ordering [...] defaults to the last variable in the tbl".

The last variable in your data set is "grp", which is not the variable you wish to rank, and which is why your top_n attempt "returns the whole of d". Thus, if you wish to rank by "x" in your data set, you need to specify wt = x.

d %>%   group_by(grp) %>%   top_n(n = 5, wt = x)

Data:

set.seed(123) d <- data.frame(   x = runif(90),   grp = gl(3, 30))

150

answered Sep 28 '22 20:09

Henrik

Related questions
                            
                                How to disable "Save workspace image?" prompt in R?
                            
                                Replace all particular values in a data frame
                            
                                Repeat rows of a data.frame [duplicate]
                            
                                group by two columns in ggplot2
                            
                                How to learn R as a programming language [closed]
                            
                                What are the "standard unambiguous date" formats for string-to-date conversion in R?
                            
                                Error: could not find function "%>%"
                            
                                Difference between as.POSIXct/as.POSIXlt and strptime for converting character vectors to POSIXct/POSIXlt
                            
                                How to add table of contents in Rmarkdown?
                            
                                Programmatically creating Markdown tables in R with KnitR
                            
                                How do I arrange a variable list of plots using grid.arrange?
                            
                                Error: gdal-config not found while installing R dependent packages whereas gdal is installed
                            
                                Easy way to export multiple data.frame to multiple Excel worksheets
                            
                                Specify custom Date format for colClasses argument in read.table/read.csv
                            
                                Sort columns of a dataframe by column name
                            
                                R: Count number of objects in list [closed]
                            
                                switch() statement usage
                            
                                Converting string to numeric [duplicate]
                            
                                R Conditional evaluation when using the pipe operator %>%
                            
                                How can I load an object into a variable name that I specify from an R data file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Getting the top values by group

Tags:

r

data.table

dplyr

Richie Cotton

People also ask

1 Answers

Data:

Henrik

Recent Activity

Donate For Us