Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to find top N descending values in group in dplyr

Tags:

r

I have a following dataframe in R

  Serivce     Codes
   ABS         RT
   ABS         RT
   ABS         TY
   ABS         DR
   ABS         DR
   ABS         DR
   ABS         DR
   DEF         RT
   DEF         RT
   DEF         TY
   DEF         DR
   DEF         DR
   DEF         DR
   DEF         DR
   DEF         TY
   DEF         SE
   DEF         SE

What I want is service wise code count in descending order

  Serivce     Codes    Count
   ABS         DR        4
   ABS         RT        2 
   ABS         TY        1
   DEF         DR        4
   DEF         RT        2
   DEF         TY        2  

I am doing following in r

df%>% 
group_by(Service,Codes) %>% 
summarise(Count = n()) %>%
top_n(n=3,wt = Count) %>% 
arrange(desc(Count)) %>% 
as.data.frame()   

But,it does not give me what is intended.

like image 202
Neil Avatar asked Jul 28 '17 05:07

Neil


People also ask

How do you find the top 3 values in R?

To get the top values in an R data frame, we can use the head function and if we want the values in decreasing order then sort function will be required. Therefore, we need to use the combination of head and sort function to find the top values in decreasing order.

What does %>% do in Dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).


1 Answers

We can try with count/arrange/slice

df1 %>% 
   count(Service, Codes) %>%
   arrange(desc(n)) %>% 
   group_by(Service) %>% 
   slice(seq_len(3))
# A tibble: 6 x 3
# Groups:   Service [2]
#  Service Codes     n
#    <chr> <chr> <int>
#1     ABS    DR     4
#2     ABS    RT     2
#3     ABS    TY     1
#4     DEF    DR     4
#5     DEF    RT     2
#6     DEF    SE     2

In the OP's code, we need to arrange by 'Service' too. As @Marius said in the comments, the top_n will include more number of rows if there are ties. One option is to do a second grouping with 'Service' and slice (as showed above) or after the grouping, we can filter

df1 %>% 
  group_by(Service,Codes) %>%
  summarise(Count = n()) %>%
  top_n(n=3,wt = Count)  %>%
  arrange(Service, desc(Count)) %>%
  group_by(Service) %>%
  filter(row_number() <=3)
like image 126
akrun Avatar answered Oct 14 '22 06:10

akrun