Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

group by in dplyr and calculating percentages

Tags:

r

I have following dataframe in r

   Service      Container_Pick_Day
    ABC              0
    ABC              1
    ABC              1
    ABC              2
    ABC              NA
    ABC              0
    ABC              1
    DEF              NA
    DEF              0
    DEF              1
    DEF              1
    DEF              1
    DEF              2
    DEF              1

Column Container_Pick_Day is numeric and consist of NA values. What I want to do is calculate Service wise percentage of containers picked up on 0th day,after 1 day,2 day and so on ignoring NA values

Desired dataframe would be

 Service      Container_Pick_Day      Percentage
    ABC              0                (2/6)*100 = 33.33 
    ABC              1                (3/6)*100 = 50  
    ABC              2                (1/6)*100 = 16.67
    DEF              0                (1/6)*100 = 16.67
    DEF              1                (3/6)*100 = 50
    DEF              2                (1/6)*100 = 16.67

I did following in R,but its generating NA values in output

  df%>% 
     group_by(Service) %>%
     summarise(pick_day_perc = n()/sum(Container_Pick_Day),na.rm=T) %>% 
     as.data.frame()

Do I have to group by Service and Container_Pick_Day both ?

like image 591
Neil Avatar asked Jul 27 '17 04:07

Neil


1 Answers

Adding an answer based on all comments above provided by @nicola, @akrun and myself,

library(dplyr)

#nicola
df %>% 
 filter(!is.na(Container_Pick_Day)) %>% 
 group_by(Service,Container_Pick_Day) %>% 
 summarise(Percentage=n()) %>% 
 group_by(Service) %>% 
 mutate(Percentage=Percentage/sum(Percentage)*100)

#akrun
df %>% 
 filter(complete.cases(Container_Pick_Day)) %>% 
 count(Service, Container_Pick_Day) %>% 
 group_by(Service) %>% 
 transmute(Container_Pick_Day, Percentage=n/sum(n)*100)

#Sotos
df %>% 
 na.omit() %>% 
 group_by_all() %>% 
 summarise(ptg = n()) %>% 
 group_by(Service) %>% 
 mutate(ptg = prop.table(ptg)*100)

All resulting to,

Service Container_Pick_Day Percentage
   <fctr>              <int>      <dbl>
1     ABC                  0   33.33333
2     ABC                  1   50.00000
3     ABC                  2   16.66667
4     DEF                  0   16.66667
5     DEF                  1   66.66667
6     DEF                  2   16.66667
like image 193
Sotos Avatar answered Oct 17 '22 06:10

Sotos