Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

k_th smallest element per group in R

Tags:

sorting

r

I have a data table, dt, that looks like

   location year value
       NYC 2026     1
       NYC 2026     2
       NYC 2026     3
       NYC 2026     4
       NYC 2026     5
        LA 2026     6
        LA 2026     7
        LA 2026     8
        LA 2026     9
        LA 2026    10

I would like to group them by city and year and find the 2nd smallest element in the column value, per each group, where the desired result looks like:

   location year value
        NYC  2026     2
         LA  2026     7

dt %>% grou_by(location, year) %>% nth(value, 2)

Would not work. Any help is appreciated.

The data table above can be created by:

dt <- structure(list(location = c("NYC", "NYC", "NYC","NYC", "NYC", 
                                   "LA", "LA", "LA", "LA", "LA"), 
                 year = c(2026, 2026, 2026, 2026, 2026,
                          2026, 2026, 2026, 2026, 2026),
                 value = c(1, 2, 3, 4, 5,
                           6, 7, 8, 9, 10)),
                 class = "data.table", 
                 row.names = c(NA, -10L))
like image 282
OverFlow Police Avatar asked Oct 24 '25 15:10

OverFlow Police


2 Answers

One dplyr possibility could be:

df %>%
 group_by(location) %>%
 arrange(value) %>%
 slice(2)

Here it groups by "location" column, arranges the values according "value" column and then keeps the second element.

  location  year value
  <chr>    <int> <int>
1 LA        2026     7
2 NYC       2026     2

Or if the values in "value" column could be duplicate, then you can do:

df %>%
 group_by(location) %>%
 distinct(value, .keep_all = TRUE) %>%
 arrange(value) %>%
 slice(2)

Or using filter() instead of slice():

df %>%
 group_by(location) %>%
 arrange(value) %>%
 filter(row_number() == 2)

The same considering also possible duplicates:

df %>%
 group_by(location) %>%
 distinct(value, .keep_all = TRUE) %>%
 arrange(value) %>%
 filter(row_number() == 2)

Or using filter() and dense_rank():

df %>%
 group_by(location) %>%
 filter(dense_rank(value) == 2)

The same considering also possible duplicates:

df %>%
 group_by(location) %>%
 distinct(value, .keep_all = TRUE) %>%
 filter(dense_rank(value) == 2)
like image 103
tmfmnk Avatar answered Oct 27 '25 04:10

tmfmnk


Using summarize to work with group_by:

> dt %>% group_by(location, year) %>% arrange(value) %>%  summarize(value = nth(value, 2))
# A tibble: 2 x 3
# Groups:   location [2]
  location  year value
  <chr>    <dbl> <dbl>
1 LA        2026     7
2 NYC       2026     2
like image 35
liuminzhao Avatar answered Oct 27 '25 04:10

liuminzhao



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!