I have following dataframe in r
Service Container_Pick_Day
ABC 0
ABC 1
ABC 1
ABC 2
ABC NA
ABC 0
ABC 1
DEF NA
DEF 0
DEF 1
DEF 1
DEF 1
DEF 2
DEF 1
Column Container_Pick_Day
is numeric and consist of NA
values.
What I want to do is calculate Service
wise percentage of containers picked up on 0th day,after 1 day,2 day and so on
ignoring NA
values
Desired dataframe would be
Service Container_Pick_Day Percentage
ABC 0 (2/6)*100 = 33.33
ABC 1 (3/6)*100 = 50
ABC 2 (1/6)*100 = 16.67
DEF 0 (1/6)*100 = 16.67
DEF 1 (3/6)*100 = 50
DEF 2 (1/6)*100 = 16.67
I did following in R,but its generating NA values in output
df%>%
group_by(Service) %>%
summarise(pick_day_perc = n()/sum(Container_Pick_Day),na.rm=T) %>%
as.data.frame()
Do I have to group by Service and Container_Pick_Day
both ?
Adding an answer based on all comments above provided by @nicola, @akrun and myself,
library(dplyr)
#nicola
df %>%
filter(!is.na(Container_Pick_Day)) %>%
group_by(Service,Container_Pick_Day) %>%
summarise(Percentage=n()) %>%
group_by(Service) %>%
mutate(Percentage=Percentage/sum(Percentage)*100)
#akrun
df %>%
filter(complete.cases(Container_Pick_Day)) %>%
count(Service, Container_Pick_Day) %>%
group_by(Service) %>%
transmute(Container_Pick_Day, Percentage=n/sum(n)*100)
#Sotos
df %>%
na.omit() %>%
group_by_all() %>%
summarise(ptg = n()) %>%
group_by(Service) %>%
mutate(ptg = prop.table(ptg)*100)
All resulting to,
Service Container_Pick_Day Percentage <fctr> <int> <dbl> 1 ABC 0 33.33333 2 ABC 1 50.00000 3 ABC 2 16.66667 4 DEF 0 16.66667 5 DEF 1 66.66667 6 DEF 2 16.66667
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With