If I'm working with a dataset and I want to group the data (i.e. by country
), compute a summary statistic (mean()
) and then ungroup()
the data.frame
to have a dataset with the original dimensions (country
-year
) and a new column that lists the mean for each country (repeated over n years), how would I do that with dplyr
? The ungroup()
function doesn't return a data.frame
with the original dimensions:
gapminder %>%
group_by(country) %>%
summarize(mn = mean(pop)) %>%
ungroup() # returns data.frame with nrows == length(unique(gapminder$country))
ungroup() removes grouping.
Running ungroup() will drop any grouping. This can be reinstated again with regroup().
Divide the Data into Groups in R Programming – split() function. split() function in R Language is used to divide a data vector into groups as defined by the factor provided.
The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.
ungroup()
is useful if you want to do something like
gapminder %>%
group_by(country) %>%
mutate(mn = pop/mean(pop)) %>%
ungroup()
where you want to do some sort of transformation that uses an entire group's statistics. In the above example, mn
is the ratio of a population to the group's average population. When it is ungrouped, any further mutations called on it would not use the grouping for aggregate statistics.
summarize
automatically reduces the dimensions, and there's no way to get that back. Perhaps you wanted to do
gapminder %>%
group_by(country) %>%
mutate(mn = mean(pop)) %>%
ungroup()
Which creates mn
as the mean for each group, replicated for each row within that group.
The summarize()
reduced the number of rows. If you didn't want to change the number of rows, then use mutate()
rather than summarize()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With