When I first started programming in R I would often use dplyr count().
library(tidyverse)
mtcars %>% count(cyl)
Once I started using apply
functions I started running into issues with count(). If I simply added ungroup() to the end of my count()'s the problems would go away.
I don't have any particular reproducibles to show. But can somebody explain what the issue likely was, why ungroup() always fixed it, and are there any drawbacks to consistently using ungroup() after every count(), or after any group_by()? Of course I'm assuming I no longer need the data grouped after it's counted or summarized.
mtcars %>% count(cyl) %>% ungroup()
If you forget to ungroup() data, future data management will likely produce errors. Always ungroup() when you've finished with your calculations.
After grouping data with group_by, there may be a need to return to a non-grouped form. Running ungroup() will drop any grouping. This can be reinstated again with regroup().
Group By Count in R using dplyr You can use group_by() function along with the summarise() from dplyr package to find the group by count in R DataFrame, group_by() returns the grouped_df ( A grouped Data Frame) and use summarise() on grouped df to get the group by count.
You can use the ungroup() function in dplyr to ungroup rows after using the group_by() function to summarize a variable by group.
The issues you used to run into were from an old behavior of count()
.
Up to dplyr 0.5.0, if you did:
mtcars %>%
count(cyl, wt)
The result would still be grouped by the cyl
column. This means, for example, that if you followed it with something like summarize(mean(am))
, you would have gotten one row for each cyl
when you may have expected one row overall. The issue would be fixed if you put %>% ungroup()
after the count.
This behavior was changed in dplyr 0.7.0 (released in June 2017), such that count()
preserves the grouping of its input (meaning mtcars %>% count(wt, cyl)
now returns an ungrouped table). This is likely why you're no longer able to reproduce the problems, and it means you no longer need to do ungroup()
after a count()
.
Note that you may still need to do ungroup()
after a group_by()
and summarize()
:
mtcars %>%
group_by(cyl, wt) %>%
summarize(n = n())
returns a tibble still grouped by cyl
:
# A tibble: 30 x 3
# Groups: cyl [?]
cyl wt n
<dbl> <dbl> <int>
1 4 1.51 1
2 4 1.62 1
3 4 1.84 1
4 4 1.94 1
5 4 2.14 1
6 4 2.2 1
7 4 2.32 1
8 4 2.46 1
9 4 2.78 1
10 4 3.15 1
# ... with 20 more rows
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With