I am new to dplyr and trying to do the following transformation without any luck. I've searched across the internet and I have found examples to do the same in ddply but I'd like to use dplyr.
I have the following data:
month type count
1 Feb-14 bbb 341
2 Feb-14 ccc 527
3 Feb-14 aaa 2674
4 Mar-14 bbb 811
5 Mar-14 ccc 1045
6 Mar-14 aaa 4417
7 Apr-14 bbb 1178
8 Apr-14 ccc 1192
9 Apr-14 aaa 4793
10 May-14 bbb 916
.. ... ... ...
I want to use dplyr to calculate the percentage of each type (aaa, bbb, ccc) at a month level i.e.
month type count per
1 Feb-14 bbb 341 9.6%
2 Feb-14 ccc 527 14.87%
3 Feb-14 aaa 2674 ..
.. ... ... ...
I've tried
data %>%
group_by(month, type) %>%
summarise(count / sum(count))
This gives a 1 as each value. How do I make the sum(count) sum across all the types in the month?
Calculate the percentage average To find the average percentage of the two percentages in this example, you need to first divide the sum of the two percentage numbers by the sum of the two sample sizes. So, 95 divided by 350 equals 0.27. You then multiply this decimal by 100 to get the average percentage.
Find the percentage of each category in data frame frame(Group,DV) library(dplyr) df %>% group_by(Group) %>% summarise(Percentage=n()/nrow(.))
Try
library(dplyr)
data %>%
group_by(month) %>%
mutate(countT= sum(count)) %>%
group_by(type, add=TRUE) %>%
mutate(per=paste0(round(100*count/countT,2),'%'))
Or make it more simpler without creating additional columns
data %>%
group_by(month) %>%
mutate(per = 100 *count/sum(count)) %>%
ungroup
We could also use left_join
after summarising the sum(count)
by 'month'
Or an option using data.table
.
library(data.table)
setkey(setDT(data), month)[data[, list(count=sum(count)), month],
per:= paste0(round(100*count/i.count,2), '%')][]
And with a bit less code:
df <- data.frame(month=c("Feb-14", "Feb-14", "Feb-14", "Mar-14", "Mar-14", "Mar-14", "Apr-14", "Apr-14", "Apr-14", "May-14"),
type=c("bbb", "ccc", "aaa", "bbb", "ccc", "aaa", "bbb", "ccc", "aaa", "bbb"),
count=c(341, 527, 2674, 811, 1045, 4417, 1178, 1192, 4793, 916))
library(dplyr)
df %>% group_by(month) %>%
mutate(per=paste0(round(count/sum(count)*100, 2), "%")) %>%
ungroup
Since you want to "leave" your data frame untouched you shouldn't use summarise
, mutate
will suffice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With