Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate or summarize to get ratios

Tags:

r

aggregate

The following is a toy problem that demonstrates my question.

I have a data frame that contains a bunch of employees; for each employee, it has a name, salary, gender and state.

aggregate(salary ~ state)  # Returns the average salary per state
aggregate(salary ~ state + gender, data, FUN = mean)  # Avg salary per state/gender

What I actually need is a summary of the fraction of the total salary earned by women in each state.

aggregate(salary ~ state + gender, data, FUN = sum)  

returns the total salary earned by women (and men) in each state ,but what I really need is salary_w / salary_total on a per-state level. I can write a for-loop, etc -- but I am wondering if there is some way to use aggregate to do that.

like image 591
bsdfish Avatar asked Dec 02 '10 23:12

bsdfish


1 Answers

Another option would be using plyr. ddply() expects a data.frame as an input and will return a data.frame as an output. The second argument is how you want to split the data frame. The third argument is what we want to apply to the chunks, here we are using summarise to create a new data.frame from the existing data.frame.

library(plyr)

#Using the sample data from kohske's answer above

> ddply(d, .(state), summarise, ratio = sum(salary[gender == "Woman"]) / sum(salary))
  state     ratio
1     1 0.5789860
2     2 0.4530224
like image 84
Chase Avatar answered Sep 27 '22 02:09

Chase