Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does dplyr::slice_min / dplyr::slice_max handle NA values with grouped data?

I'm wondering if anyone can explain the behavior of dplyr::slice_min() /dplyr::slice_max() with regards to the with_ties argument. For grouped data, why does the function exclude NA values when with_ties = TRUE but includes NA values when with_ties = FALSE? Reprex below:

library(tidyverse)

tbl <- tibble(ID = rep(c("a","b","c","d"), each = 3),
       measure = c(NA, NA, NA, NA, 1, 1, 2, 3, 4, NA, NA, NA))

tbl |> 
  group_by(ID) |> 
  slice_max(measure, with_ties = TRUE)

# A tibble: 3 × 2
# Groups:   ID [2]
  ID    measure
  <chr>   <dbl>
1 b           1
2 b           1
3 c           4

tbl |> 
  group_by(ID) |> 
  slice_max(measure, with_ties = FALSE)
# A tibble: 4 × 2
# Groups:   ID [4]
  ID    measure
  <chr>   <dbl>
1 a          NA
2 b           1
3 c           4
4 d          NA
like image 830
trevin_flick Avatar asked Oct 14 '25 14:10

trevin_flick


1 Answers

This inconsistency seems to have been acknowledged very recently (23rd March 2022) in this GitHub pull request, but the change has not been done yet.

When the with_ties argument was set to FALSE NAs w[h]ere not ignored anymore. This PR fixes that.

The default behavior should be to ignore NAs.


In the meantime, you can still use tidyr::drop_na:

tbl |> 
  group_by(ID) |> 
  slice_max(measure, with_ties = FALSE) |> 
  drop_na()
like image 127
Maël Avatar answered Oct 17 '25 09:10

Maël