Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate proportions within subsets of a data frame

Tags:

r

plyr

I am trying to obtain proportions within subsets of a data frame. For example, in this made-up data frame:

DF<-data.frame(category1=rep(c("A","B"),each=9),
    category2=rep(rep(LETTERS[24:26],each=3),2),
     animal=rep(c("dog","cat","mouse"),6),number=sample(18))

I would like like to calculate the proportion of each of the three animals for each category1 by category2 combination (e.g., out of all animals that are both "A" and "X", what proportion are dogs?). With prop.table on column 4 of the data frame I can get the proportion that each row makes up of the total "number" column, but I have not found a way to do this for subsets based on category 1 and 2. I also tried splitting the data by category1 and category2 using this:

splitDF<-split(DF,list(DF$category1,DF$category2))

And I was hoping I could then apply a function with prop.table to get the proportions of each animal within each split group, but I cannot get prop.table working because I can't seem to specify which column of data to apply the function to within the split groups. Does anyone have any tips? Maybe this is possible with plyr or something similar? I can't find anything in the help forums about ways to get proportions within subsets of data.

like image 215
user2093526 Avatar asked Feb 21 '13 17:02

user2093526


1 Answers

You can use function ddply() from library plyr to calculate proportions for each combination and then add new column to data frame.

 library(plyr)     
 DF<-ddply(DF,.(category1,category2),transform,prop=number/sum(number))
 DF
   category1 category2 animal number       prop
1          A         X    dog     17 0.44736842
2          A         X    cat      3 0.07894737
3          A         X  mouse     18 0.47368421
4          A         Y    dog      2 0.14285714
like image 84
Didzis Elferts Avatar answered Nov 11 '22 12:11

Didzis Elferts