If I pass a grouped data frame to a function, then change the name of the grouped variable, the grouping of the original data frame gets changed to the new name. When the function returns (I am not returning the altered data frame), the names of the original data frame are unchanged but the grouping is changed to the non-existent name.
# test scoping of group_by() which appears to change groups
library(dplyr)
muck_up_group<-function(mydf){
mydf<-mydf %>% rename(UhOh=Species)
}
dont_muck_up_group<-function(mydf){
mydf<-mydf %>% ungroup()
mydf<-mydf %>% rename(UhOh=Species)
}
data("iris")
iris<-as_tibble(iris) %>% group_by(Species)
iris
# A tibble: 150 x 5
# Groups: Species [3]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <dbl> <dbl> <dbl> <dbl> <fctr>
# 1 5.1 3.5 1.4 0.2 setosa
muck_up_group(iris) # original grouping changed to column name that doesn't exist
iris
# A tibble: 150 x 5
# Groups: UhOh [3]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <dbl> <dbl> <dbl> <dbl> <fctr>
# 1 5.1 3.5 1.4 0.2 setosa
#restore original state
iris<-as_tibble(iris) %>% group_by(Species)
dont_muck_up_group(iris) # original grouping preserved
iris
# A tibble: 150 x 5
# Groups: Species [3]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <dbl> <dbl> <dbl> <dbl> <fctr>
# 1 5.1 3.5 1.4 0.2 setosa
I can understand why it might be bad practice to change the name of a grouping variable but it is permissible. This seems to be an example of an attribute of a variable being passed by reference when the content is being passed by value (as we understand R does normally).
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] graphics grDevices utils datasets stats methods base
other attached packages:
[1] lubridate_1.6.0 bindrcpp_0.2 mFilter_0.1-3
[4] ggrepel_0.6.5 reshape2_1.4.2 scales_0.4.1
[7] purrr_0.2.3 readr_1.1.1 tidyr_0.7.0
[10] tibble_1.3.4 tidyverse_1.1.1 knitr_1.17
[13] Rblpapi_0.3.6 stringr_1.2.0 rvest_0.3.2
[16] xml2_1.1.1 devtools_1.13.3 dplyr_0.7.2
[19] plyr_1.8.4 ggplot2_2.2.1 PerformanceAnalytics_1.4.3541
[22] xts_0.10-0 zoo_1.8-0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 lattice_0.20-35 assertthat_0.2.0 rprojroot_1.2 digest_0.6.12
[6] psych_1.7.5 R6_2.2.2 cellranger_1.1.0 backports_1.1.0 evaluate_0.10.1
[11] httr_1.3.1 highr_0.6 rlang_0.1.2 curl_2.8.1 lazyeval_0.2.0
[16] readxl_1.0.0 TTR_0.23-2 tidyquant_0.5.3 rmarkdown_1.6 labeling_0.3
[21] foreign_0.8-67 munsell_0.4.3 broom_0.4.2 compiler_3.4.0 modelr_0.1.1
[26] pkgconfig_2.0.1 base64enc_0.1-3 mnormt_1.5-5 htmltools_0.3.6 tidyselect_0.1.1
[31] withr_2.0.0 Quandl_2.8.0 grid_3.4.0 nlme_3.1-131 jsonlite_1.5
[36] gtable_0.2.0 magrittr_1.5 quantmod_0.4-10 stringi_1.1.5 RColorBrewer_1.1-2
[41] tools_3.4.0 forcats_0.2.0 glue_1.1.1 hms_0.3 rsconnect_0.8.5
[46] parallel_3.4.0 yaml_2.1.14 colorspace_1.3-2 memoise_1.1.0 bindr_0.1
[51] haven_1.1.0
>
Bug? Thanks.
Groupby Function in R – group_by is used to group the dataframe in R. Dplyr package in R is provided with group_by() function which groups the dataframe by multiple columns with mean, sum and other functions like count, maximum and minimum.
group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". ungroup() removes grouping.
See @aosmith’s comment above. Dplyr closed issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With