why does group_by() affect out-of-scope data frames?




If I pass a grouped data frame to a function, then change the name of the grouped variable, the grouping of the original data frame gets changed to the new name. When the function returns (I am not returning the altered data frame), the names of the original data frame are unchanged but the grouping is changed to the non-existent name.

# test scoping of group_by() which appears to change groups

  mydf<-mydf %>% rename(UhOh=Species)

  mydf<-mydf %>% ungroup()
  mydf<-mydf %>% rename(UhOh=Species)

iris<-as_tibble(iris) %>% group_by(Species)
# A tibble: 150 x 5
# Groups:   Species [3]
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
# 1          5.1         3.5          1.4         0.2  setosa

muck_up_group(iris) # original grouping changed to column name that doesn't exist
# A tibble: 150 x 5
# Groups:   UhOh [3]
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
# 1          5.1         3.5          1.4         0.2  setosa

#restore original state
iris<-as_tibble(iris) %>% group_by(Species)
dont_muck_up_group(iris) # original grouping preserved
# A tibble: 150 x 5
# Groups:   Species [3]
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
# 1          5.1         3.5          1.4         0.2  setosa

I can understand why it might be bad practice to change the name of a grouping variable but it is permissible. This seems to be an example of an attribute of a variable being passed by reference when the content is being passed by value (as we understand R does normally).

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] graphics  grDevices utils     datasets  stats     methods   base     

other attached packages:
 [1] lubridate_1.6.0               bindrcpp_0.2                  mFilter_0.1-3                
 [4] ggrepel_0.6.5                 reshape2_1.4.2                scales_0.4.1                 
 [7] purrr_0.2.3                   readr_1.1.1                   tidyr_0.7.0                  
[10] tibble_1.3.4                  tidyverse_1.1.1               knitr_1.17                   
[13] Rblpapi_0.3.6                 stringr_1.2.0                 rvest_0.3.2                  
[16] xml2_1.1.1                    devtools_1.13.3               dplyr_0.7.2                  
[19] plyr_1.8.4                    ggplot2_2.2.1                 PerformanceAnalytics_1.4.3541
[22] xts_0.10-0                    zoo_1.8-0                    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12       lattice_0.20-35    assertthat_0.2.0   rprojroot_1.2      digest_0.6.12     
 [6] psych_1.7.5        R6_2.2.2           cellranger_1.1.0   backports_1.1.0    evaluate_0.10.1   
[11] httr_1.3.1         highr_0.6          rlang_0.1.2        curl_2.8.1         lazyeval_0.2.0    
[16] readxl_1.0.0       TTR_0.23-2         tidyquant_0.5.3    rmarkdown_1.6      labeling_0.3      
[21] foreign_0.8-67     munsell_0.4.3      broom_0.4.2        compiler_3.4.0     modelr_0.1.1      
[26] pkgconfig_2.0.1    base64enc_0.1-3    mnormt_1.5-5       htmltools_0.3.6    tidyselect_0.1.1  
[31] withr_2.0.0        Quandl_2.8.0       grid_3.4.0         nlme_3.1-131       jsonlite_1.5      
[36] gtable_0.2.0       magrittr_1.5       quantmod_0.4-10    stringi_1.1.5      RColorBrewer_1.1-2
[41] tools_3.4.0        forcats_0.2.0      glue_1.1.1         hms_0.3            rsconnect_0.8.5   
[46] parallel_3.4.0     yaml_2.1.14        colorspace_1.3-2   memoise_1.1.0      bindr_0.1         
[51] haven_1.1.0       

Bug? Thanks.

1 Answers

See @aosmith’s comment above. Dplyr closed issue.

