Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Dplyr "group_by" and "Summarise" and a Custom Function to Calculate the Mode of Several Groups

Tags:

r

dplyr

Apparently dplyr's summarise function doesn't include an option for "mode". Based on the simple data frame example below, I would like to determine the mode, or most frequently repeating number, for each group of "Category." So for group "A", the mode is 22, for "B", it's 12 and 14, and there is no repeating number for "C".

I found some examples of functions online, but none addressed the situation when there are no repeating numbers in a group. Is there a need for a custom function, or is there a mode option somewhere? I don't want to rely on any other specialized packages just for their mode function. It would be nice to find an elegant and simple solutioin using a combination of base R, dplyr, tidy, etc.

If a custom function is used, it will have to work when there are no repeating numbers, as well as when there are more than one equally repeating number.

Any help would be greatly appreciated! This seems like it should be an easy solutioin in R, so I was surprised to learn that there is no simple summarise_each(funs(mode)... option.

If a custom function is used, please break it down with explanations. I'm still relatively new to R functions.

Category<-c("A","B","B","C","A","A","A","B","C","B","C","C")
Number<-c(22,12,12,8,22,22,18,14,10,14,1,3)
DF<-data.frame(Category,Number)
like image 410
Mike Avatar asked Jul 05 '16 03:07

Mike


1 Answers

We can use

 Mode <- function(x) {
  ux <- unique(x)
  if(!anyDuplicated(x)){
      NA_character_ } else { 
     tbl <-   tabulate(match(x, ux))
     toString(ux[tbl==max(tbl)])
 }
}

DF %>%
   group_by(Category) %>%
   summarise(NumberMode = Mode(Number))
#  Category NumberMode
#    <fctr>      <chr>
#1        A         22
#2        B     12, 14
#3        C       <NA>
like image 77
akrun Avatar answered Oct 23 '22 03:10

akrun