I have a dataset in R of student weekly allowances by class, which looks like:
Year ID Class Allowance
2013 123 Freshman 100
2013 234 Freshman 110
2013 345 Sophomore 150
2013 456 Sophomore 200
2013 567 Junior 250
2014 678 Junior 100
2014 789 Junior 230
2014 890 Freshman 110
2014 891 Freshman 250
2014 892 Sophomore 220
How can I summarize the results by group (Year/Class) to get sum and % (by group)? Getting sum seems easy with ddply
by just couldn't get the % by group part right.
It works for sum
:
summary <- ddply(my_data, .(Year, Class), summarize, Sum_Allow=sum(Allowance))
But it doesn't work for the percentage by group part:
summary <- ddply(my_data, .(Year, Class), summarize, Sum_Allow=sum(Allowance),
Allow_Pct=Allowance/sum(Allowance))
Ideal result should look like:
Year Class Sum_Allow Allow_Pct
2013 Freshman 210 26%
2013 Junior 250 31%
2013 Sophomore 350 43%
2014 Freshman 360 40%
2014 Junior 330 36%
2014 Sophomore 220 24%
I tried ddply from the plyr package, but please let me know of any way that this may work.
Here's a possible solution using data.table
package (assuming your data called df
)
library(data.table)
setDT(df)[, list(Sum_Allow = sum(Allowance)), keyby = list(Year, Class)][,
Allow_Pct := paste0(round(Sum_Allow/sum(Sum_Allow), 2)*100, "%"), by = Year][]
# Year Class Sum_Allow Allow_Pct
# 1: 2013 Freshman 210 26%
# 2: 2013 Junior 250 31%
# 3: 2013 Sophomore 350 43%
# 4: 2014 Freshman 360 40%
# 5: 2014 Junior 330 36%
# 6: 2014 Sophomore 220 24%
Contributed to @rawr, here's a possible base R solution
df2 <- aggregate(Allowance ~ Class + Year, df, sum)
transform(df2, Allow_pct = ave(Allowance, Year, FUN = function(x) paste0(round(x/sum(x), 2)*100, "%")))
# Class Year Allowance Allow_pct
# 1 Freshman 2013 210 26%
# 2 Junior 2013 250 31%
# 3 Sophomore 2013 350 43%
# 4 Freshman 2014 360 40%
# 5 Junior 2014 330 36%
# 6 Sophomore 2014 220 24%
You could do this in two steps
my_data <- read.table(header = TRUE,
text = "Year ID Class Allowance
2013 123 Freshman 100
2013 234 Freshman 110
2013 345 Sophomore 150
2013 456 Sophomore 200
2013 567 Junior 250
2014 678 Junior 100
2014 789 Junior 230
2014 890 Freshman 110
2014 891 Freshman 250
2014 892 Sophomore 220")
library(plyr)
(summ <- ddply(my_data, .(Year, Class), summarize, Sum_Allow=sum(Allowance)))
# Year Class Sum_Allow
# 1 2013 Freshman 210
# 2 2013 Junior 250
# 3 2013 Sophomore 350
# 4 2014 Freshman 360
# 5 2014 Junior 330
# 6 2014 Sophomore 220
ddply(summ, .(Year), mutate, Allow_pct = Sum_Allow / sum(Sum_Allow) * 100)
# Year Class Sum_Allow Allow_pct
# 1 2013 Freshman 210 25.92593
# 2 2013 Junior 250 30.86420
# 3 2013 Sophomore 350 43.20988
# 4 2014 Freshman 360 39.56044
# 5 2014 Junior 330 36.26374
# 6 2014 Sophomore 220 24.17582
I don't know if it happens for the rest of you, but when I run the original attempt, R crashes rather than throwing a warning. Or if I misspell Allow instead of allow, it crashes. I really hate that; hadley pls fix
base r forever
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With