Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does cur_data() within summarize() return df_slice() error?

Tags:

r

dplyr

tidyverse

I ran into trouble today when using cur_data() within summarize().

Example data:

library(tidyverse)

dat <- tibble(id = 1:6,
              type = c(1, 1, 2, 2, 3, 3),
              value = c(2, 4, 6, 8, 7, NA))

This first pipeline throws an error, mentioning df_slice():

dat %>%
  group_by(type) %>%
  summarize(mean = mean(value),
            n = length(cur_data() %>% filter(!is.na(value)) %>% pull(id) %>% unique()),
            .groups = "drop")
#> Error in `summarize()`:
#> ! Problem while computing `n = length(...)`.
#> ℹ The error occurred in group 1: type = 1.
#> Caused by error:
#> ! Internal error in `df_slice()`: Columns must match the data frame size.

However, switching the order of the summary stats within summarize() avoids the error:

dat %>%
  group_by(type) %>%
  summarize(n = length(cur_data() %>% filter(!is.na(value)) %>% pull(id) %>% unique()),
            mean = mean(value),
            .groups = "drop")
#> # A tibble: 3 × 3
#>    type     n  mean
#>   <dbl> <int> <dbl>
#> 1     1     2     3
#> 2     2     2     7
#> 3     3     1    NA

Additionally, piping cur_data() into as.data.frame() also avoids the error:

dat %>%
  group_by(type) %>%
  summarize(mean = mean(value),
            n = length(cur_data() %>% as.data.frame() %>% filter(!is.na(value)) %>% pull(id) %>% unique()),
            .groups = "drop")
#> # A tibble: 3 × 3
#>    type  mean     n
#>   <dbl> <dbl> <int>
#> 1     1     3     2
#> 2     2     7     2
#> 3     3    NA     1
Created on 2022-02-15 by the reprex package (v2.0.1)

Why can I not use the first example syntax? Ultimately I calculated anything that required cur_data() within mutate() and just kept the first() observation within a later summarize() call, but I'd like to know what I'm missing about summarize().

Additional session info:

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: aarch64-apple-darwin20.6.0 (64-bit)
Running under: macOS Monterey 12.1

Matrix products: default
LAPACK: /opt/homebrew/Cellar/r/4.1.2/lib/R/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reprex_2.0.1         palmerpenguins_0.1.0 forcats_0.5.1        stringr_1.4.0        readr_2.1.2         
 [6] tibble_3.1.6         ggplot2_3.3.5        tidyverse_1.3.1      tidyr_1.2.0          purrr_0.3.4         
[11] dplyr_1.0.8
like image 439
billybarc Avatar asked May 25 '26 03:05

billybarc


1 Answers

This is an open issue with dplyr: https://github.com/tidyverse/dplyr/issues/6138

To paraphrase the discussion in the GitHub issue: The problem is caused by cur_data() including the previously summarised column (in this case, mean), without it having been recycled to match the number of rows in the data frame. That makes cur_data() essentially a malfromed data frame.

In your case, using as.data.frame() solves the problem because it does the recycling to make mean match the rest of the columns in length, and having the statements in a different order solves the problem because at that point cur_data() doesn’t include any new columns yet.

library(dplyr, warn.conflicts = FALSE)

dat <- tibble(
  id = 1:6,
  type = c(1, 1, 2, 2, 3, 3),
  value = c(2, 4, 6, 8, 7, NA)
)

dat %>%
  group_by(type) %>%
  summarize(
    mean = mean(value),
    str(cur_data())
  )
#> tibble [2 x 3] (S3: tbl_df/tbl/data.frame)
#>  $ id   : int [1:2] 1 2
#>  $ value: num [1:2] 2 4
#>  $ mean : num 3
#> tibble [2 x 3] (S3: tbl_df/tbl/data.frame)
#>  $ id   : int [1:2] 3 4
#>  $ value: num [1:2] 6 8
#>  $ mean : num 7
#> tibble [2 x 3] (S3: tbl_df/tbl/data.frame)
#>  $ id   : int [1:2] 5 6
#>  $ value: num [1:2] 7 NA
#>  $ mean : num NA
#> # A tibble: 3 x 2
#>    type  mean
#>   <dbl> <dbl>
#> 1     1     3
#> 2     2     7
#> 3     3    NA
like image 54
Mikko Marttila Avatar answered May 27 '26 18:05

Mikko Marttila



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!