The following is a toy problem that demonstrates my question.
I have a data frame that contains a bunch of employees; for each employee, it has a name, salary, gender and state.
aggregate(salary ~ state) # Returns the average salary per state
aggregate(salary ~ state + gender, data, FUN = mean) # Avg salary per state/gender
What I actually need is a summary of the fraction of the total salary earned by women in each state.
aggregate(salary ~ state + gender, data, FUN = sum)
returns the total salary earned by women (and men) in each state ,but what I really need is salary_w / salary_total on a per-state level. I can write a for-loop, etc -- but I am wondering if there is some way to use aggregate to do that.
Another option would be using plyr. ddply()
expects a data.frame as an input and will return a data.frame as an output. The second argument is how you want to split the data frame. The third argument is what we want to apply to the chunks, here we are using summarise
to create a new data.frame from the existing data.frame.
library(plyr)
#Using the sample data from kohske's answer above
> ddply(d, .(state), summarise, ratio = sum(salary[gender == "Woman"]) / sum(salary))
state ratio
1 1 0.5789860
2 2 0.4530224
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With