I have a data frame dat1
Country Count
1 AUS 1
2 NZ 2
3 NZ 1
4 USA 3
5 AUS 1
6 IND 2
7 AUS 4
8 USA 2
9 JPN 5
10 CN 2
First I want to sum "Count" per "Country". Then the top 3 total counts per country should be combined with an additional row "Others", which is the sum of countries which are not part of top 3.
The expected outcome therefore would be:
Country Count
1 AUS 6
2 JPN 5
3 USA 5
4 Others 7
I have tried the below code, but could not figure out how to place the "Others" row.
dat1 %>%
group_by(Country) %>%
summarise(Count = sum(Count)) %>%
arrange(desc(Count)) %>%
top_n(3)
This code currently gives:
Country Count
1 AUS 6
2 JPN 5
3 USA 5
Any help would be greatly appreciated.
dat1 <- structure(list(Country = structure(c(1L, 5L, 5L, 6L, 1L, 3L,
1L, 6L, 4L, 2L), .Label = c("AUS", "CN", "IND", "JPN", "NZ",
"USA"), class = "factor"), Count = c(1L, 2L, 1L, 3L, 1L, 2L,
4L, 2L, 5L, 2L)), .Names = c("Country", "Count"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
Instead of top_n
, this seems like a good case for the convenience function tally
. It uses summarise
, sum
and arrange
under the hood.
Then use factor
to create an "Other" category. Use the levels
argument to set "Other" as the last level. "Other" will then will be placed last in the table (and in any subsequent plot of the result).
If "Country" is factor
in your original data, you may wrap Country[1:3]
in as.character
.
group_by(df, Country) %>%
tally(Count, sort = TRUE) %>%
group_by(Country = factor(c(Country[1:3], rep("Other", n() - 3)),
levels = c(Country[1:3], "Other"))) %>%
tally(n)
# Country n
# (fctr) (int)
#1 AUS 6
#2 JPN 5
#3 USA 5
#4 Other 7
You can use fct_lump
from the forcats
library
dat1 %>%
group_by(fct_lump(Country, n = 3, w = Count)) %>%
summarize(Count = sum(Count))
This should do it, also you can change the "Other" label using the other_level
param inside fct_lump
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With