Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

in R dplyr why do I need to ungroup() after I count()?

When I first started programming in R I would often use dplyr count().

library(tidyverse)    
mtcars %>% count(cyl)

Once I started using apply functions I started running into issues with count(). If I simply added ungroup() to the end of my count()'s the problems would go away.

I don't have any particular reproducibles to show. But can somebody explain what the issue likely was, why ungroup() always fixed it, and are there any drawbacks to consistently using ungroup() after every count(), or after any group_by()? Of course I'm assuming I no longer need the data grouped after it's counted or summarized.

mtcars %>% count(cyl) %>% ungroup()
like image 575
stackinator Avatar asked Jul 18 '18 14:07

stackinator


People also ask

Why do you need to ungroup in R?

If you forget to ungroup() data, future data management will likely produce errors. Always ungroup() when you've finished with your calculations.

What does ungroup function do in R?

After grouping data with group_by, there may be a need to return to a non-grouped form. Running ungroup() will drop any grouping. This can be reinstated again with regroup().

How do I count by group in R?

Group By Count in R using dplyr You can use group_by() function along with the summarise() from dplyr package to find the group by count in R DataFrame, group_by() returns the grouped_df ( A grouped Data Frame) and use summarise() on grouped df to get the group by count.

How do I ungroup a DataFrame in R?

You can use the ungroup() function in dplyr to ungroup rows after using the group_by() function to summarize a variable by group.


1 Answers

The issues you used to run into were from an old behavior of count(). Up to dplyr 0.5.0, if you did:

mtcars %>%
  count(cyl, wt)

The result would still be grouped by the cyl column. This means, for example, that if you followed it with something like summarize(mean(am)), you would have gotten one row for each cyl when you may have expected one row overall. The issue would be fixed if you put %>% ungroup() after the count.

This behavior was changed in dplyr 0.7.0 (released in June 2017), such that count() preserves the grouping of its input (meaning mtcars %>% count(wt, cyl) now returns an ungrouped table). This is likely why you're no longer able to reproduce the problems, and it means you no longer need to do ungroup() after a count().


Note that you may still need to do ungroup() after a group_by() and summarize():

mtcars %>%
  group_by(cyl, wt) %>%
  summarize(n = n())

returns a tibble still grouped by cyl:

# A tibble: 30 x 3
# Groups:   cyl [?]
     cyl    wt     n
   <dbl> <dbl> <int>
 1     4  1.51     1
 2     4  1.62     1
 3     4  1.84     1
 4     4  1.94     1
 5     4  2.14     1
 6     4  2.2      1
 7     4  2.32     1
 8     4  2.46     1
 9     4  2.78     1
10     4  3.15     1
# ... with 20 more rows
like image 88
David Robinson Avatar answered Oct 07 '22 23:10

David Robinson