Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine result from top_n with an "Other" category in dplyr

Tags:

r

dplyr

I have a data frame dat1

   Country Count
1      AUS     1
2       NZ     2
3       NZ     1
4      USA     3
5      AUS     1
6      IND     2
7      AUS     4
8      USA     2
9      JPN     5
10      CN     2

First I want to sum "Count" per "Country". Then the top 3 total counts per country should be combined with an additional row "Others", which is the sum of countries which are not part of top 3.

The expected outcome therefore would be:

    Country Count
1     AUS     6
2     JPN     5
3     USA     5
4     Others  7

I have tried the below code, but could not figure out how to place the "Others" row.

dat1 %>%
    group_by(Country) %>%
    summarise(Count = sum(Count)) %>%
    arrange(desc(Count)) %>%
    top_n(3)

This code currently gives:

    Country Count
1     AUS     6
2     JPN     5
3     USA     5

Any help would be greatly appreciated.

dat1 <- structure(list(Country = structure(c(1L, 5L, 5L, 6L, 1L, 3L, 
    1L, 6L, 4L, 2L), .Label = c("AUS", "CN", "IND", "JPN", "NZ", 
    "USA"), class = "factor"), Count = c(1L, 2L, 1L, 3L, 1L, 2L, 
    4L, 2L, 5L, 2L)), .Names = c("Country", "Count"), class = "data.frame",     row.names = c("1", 
    "2", "3", "4", "5", "6", "7", "8", "9", "10"))
like image 953
abhy3 Avatar asked Jan 31 '16 12:01

abhy3


2 Answers

Instead of top_n, this seems like a good case for the convenience function tally. It uses summarise, sum and arrange under the hood.

Then use factor to create an "Other" category. Use the levels argument to set "Other" as the last level. "Other" will then will be placed last in the table (and in any subsequent plot of the result).

If "Country" is factor in your original data, you may wrap Country[1:3] in as.character.

group_by(df, Country) %>%
  tally(Count, sort = TRUE) %>%
  group_by(Country = factor(c(Country[1:3], rep("Other", n() - 3)),
                            levels = c(Country[1:3], "Other"))) %>%
  tally(n) 

#  Country     n
#   (fctr) (int)
#1     AUS     6
#2     JPN     5
#3     USA     5
#4   Other     7
like image 155
Henrik Avatar answered Oct 28 '22 08:10

Henrik


You can use fct_lump from the forcats library

dat1 %>%
  group_by(fct_lump(Country, n = 3, w = Count)) %>%
  summarize(Count = sum(Count))

This should do it, also you can change the "Other" label using the other_level param inside fct_lump

like image 30
deradelo Avatar answered Oct 28 '22 08:10

deradelo