I have two weeks experience in R and will appreciate your help.
I have a data table that was constructed with count(), and I want to calculate the percentage of the frequencies by categories. So if this is my data frame:
name cat1 cat2 freq
A 1 1 32
A 1 0 56
A 0 1 36
A 0 0 25
B 1 1 14
B 1 0 68
B 0 1 58
B 0 0 90
I want to calculate the percentage by name and by cat1 (cat2 = 1,0 is the total). I have a number of data frames, for some of the names it could be that only cat1=0 & cat2=0, and because of the different structures I don't can't do it straightforward.
For example, the first line will be (32/(32+56))*100, the fourth (25/(25+36))*100.
Any ideas?
Thanks
You may want to try using data.table. You also get the advantage of speed if working with large tables.
library(data.table)
#if your data is already stored as a data frame,
#you can always skip the next step and continue with data <- data.table(data)
data <- data.table(name=rep(c("A","B"), each=4), cat1=c(1,1,0,0,1,1,0,0), cat2=c(1,0,1,0,1,0,1,0), freq=c(32,56,36,25,14,68,58,90))
data[, percen := sum(freq), by=list(name,cat1)]
data[, percen := freq/percen]
data
> data
name cat1 cat2 freq percen
1: A 1 1 32 0.3636364
2: A 1 0 56 0.6363636
3: A 0 1 36 0.5901639
4: A 0 0 25 0.4098361
5: B 1 1 14 0.1707317
6: B 1 0 68 0.8292683
7: B 0 1 58 0.3918919
8: B 0 0 90 0.6081081
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With