Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Explain ungroup() in dplyr

Tags:

r

dplyr

If I'm working with a dataset and I want to group the data (i.e. by country), compute a summary statistic (mean()) and then ungroup() the data.frame to have a dataset with the original dimensions (country-year) and a new column that lists the mean for each country (repeated over n years), how would I do that with dplyr? The ungroup() function doesn't return a data.frame with the original dimensions:

gapminder %>%
    group_by(country) %>%
    summarize(mn = mean(pop)) %>%
    ungroup() # returns data.frame with nrows == length(unique(gapminder$country))
like image 844
Emily Avatar asked Jan 25 '18 15:01

Emily


People also ask

What is ungroup dplyr?

ungroup() removes grouping.

What is ungroup function?

Running ungroup() will drop any grouping. This can be reinstated again with regroup().

How do I separate a group of data in R?

Divide the Data into Groups in R Programming – split() function. split() function in R Language is used to divide a data vector into groups as defined by the factor provided.

Can you group by multiple columns in dplyr?

The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.


2 Answers

ungroup() is useful if you want to do something like

gapminder %>%
group_by(country) %>%
mutate(mn = pop/mean(pop)) %>%
ungroup() 

where you want to do some sort of transformation that uses an entire group's statistics. In the above example, mn is the ratio of a population to the group's average population. When it is ungrouped, any further mutations called on it would not use the grouping for aggregate statistics.

summarize automatically reduces the dimensions, and there's no way to get that back. Perhaps you wanted to do

gapminder %>%
group_by(country) %>%
mutate(mn = mean(pop)) %>%
ungroup() 

Which creates mn as the mean for each group, replicated for each row within that group.

like image 93
Max Candocia Avatar answered Oct 09 '22 11:10

Max Candocia


The summarize() reduced the number of rows. If you didn't want to change the number of rows, then use mutate() rather than summarize().

like image 15
MrFlick Avatar answered Oct 09 '22 11:10

MrFlick