Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why does group_by() affect out-of-scope data frames?

Tags:

r

dplyr

If I pass a grouped data frame to a function, then change the name of the grouped variable, the grouping of the original data frame gets changed to the new name. When the function returns (I am not returning the altered data frame), the names of the original data frame are unchanged but the grouping is changed to the non-existent name.

# test scoping of group_by() which appears to change groups
library(dplyr)

muck_up_group<-function(mydf){
  mydf<-mydf %>% rename(UhOh=Species)
}

dont_muck_up_group<-function(mydf){
  mydf<-mydf %>% ungroup()
  mydf<-mydf %>% rename(UhOh=Species)
}

data("iris")
iris<-as_tibble(iris) %>% group_by(Species)
iris
# A tibble: 150 x 5
# Groups:   Species [3]
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
# 1          5.1         3.5          1.4         0.2  setosa

muck_up_group(iris) # original grouping changed to column name that doesn't exist
iris
# A tibble: 150 x 5
# Groups:   UhOh [3]
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
# 1          5.1         3.5          1.4         0.2  setosa

#restore original state
iris<-as_tibble(iris) %>% group_by(Species)
dont_muck_up_group(iris) # original grouping preserved
iris
# A tibble: 150 x 5
# Groups:   Species [3]
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
# 1          5.1         3.5          1.4         0.2  setosa

I can understand why it might be bad practice to change the name of a grouping variable but it is permissible. This seems to be an example of an attribute of a variable being passed by reference when the content is being passed by value (as we understand R does normally).

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] graphics  grDevices utils     datasets  stats     methods   base     

other attached packages:
 [1] lubridate_1.6.0               bindrcpp_0.2                  mFilter_0.1-3                
 [4] ggrepel_0.6.5                 reshape2_1.4.2                scales_0.4.1                 
 [7] purrr_0.2.3                   readr_1.1.1                   tidyr_0.7.0                  
[10] tibble_1.3.4                  tidyverse_1.1.1               knitr_1.17                   
[13] Rblpapi_0.3.6                 stringr_1.2.0                 rvest_0.3.2                  
[16] xml2_1.1.1                    devtools_1.13.3               dplyr_0.7.2                  
[19] plyr_1.8.4                    ggplot2_2.2.1                 PerformanceAnalytics_1.4.3541
[22] xts_0.10-0                    zoo_1.8-0                    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12       lattice_0.20-35    assertthat_0.2.0   rprojroot_1.2      digest_0.6.12     
 [6] psych_1.7.5        R6_2.2.2           cellranger_1.1.0   backports_1.1.0    evaluate_0.10.1   
[11] httr_1.3.1         highr_0.6          rlang_0.1.2        curl_2.8.1         lazyeval_0.2.0    
[16] readxl_1.0.0       TTR_0.23-2         tidyquant_0.5.3    rmarkdown_1.6      labeling_0.3      
[21] foreign_0.8-67     munsell_0.4.3      broom_0.4.2        compiler_3.4.0     modelr_0.1.1      
[26] pkgconfig_2.0.1    base64enc_0.1-3    mnormt_1.5-5       htmltools_0.3.6    tidyselect_0.1.1  
[31] withr_2.0.0        Quandl_2.8.0       grid_3.4.0         nlme_3.1-131       jsonlite_1.5      
[36] gtable_0.2.0       magrittr_1.5       quantmod_0.4-10    stringi_1.1.5      RColorBrewer_1.1-2
[41] tools_3.4.0        forcats_0.2.0      glue_1.1.1         hms_0.3            rsconnect_0.8.5   
[46] parallel_3.4.0     yaml_2.1.14        colorspace_1.3-2   memoise_1.1.0      bindr_0.1         
[51] haven_1.1.0       
>

Bug? Thanks.

like image 833
Art Avatar asked Aug 29 '17 15:08

Art


People also ask

What does the Group_by function do in R?

Groupby Function in R – group_by is used to group the dataframe in R. Dplyr package in R is provided with group_by() function which groups the dataframe by multiple columns with mean, sum and other functions like count, maximum and minimum.

What does group_ by() do?

group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". ungroup() removes grouping.


1 Answers

See @aosmith’s comment above. Dplyr closed issue.

like image 96
Art Avatar answered Oct 22 '22 22:10

Art