Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summarizing by subgroup percentage in R

I have a dataset like this:

df = data.frame(group = c(rep('A',4), rep('B',3)),                 subgroup = c('a', 'b', 'c', 'd', 'a', 'b', 'c'),                 value = c(1,4,2,1,1,2,3))   group | subgroup | value ------------------------   A   |    a     |  1   A   |    b     |  4   A   |    c     |  2   A   |    d     |  1   B   |    a     |  1   B   |    b     |  2   B   |    c     |  3 

What I want is to get the percentage of the values of each subgroup within each group, i.e. the output should be:

group | subgroup | percent ------------------------   A   |    a     |  0.125   A   |    b     |  0.500   A   |    c     |  0.250   A   |    d     |  0.125   B   |    a     |  0.167   B   |    b     |  0.333   B   |    c     |  0.500 

Example for group A, subgroup A: the value was 1, the sum of the whole group A is 8 (a=1, b=4, c=2, d=1) - hence 1/8 = 0.125

So far I've only found fairly simple aggregates like this, but I cannot figure out how to do the "divide by a sum within a subgroup" part.

like image 685
oliver13 Avatar asked Nov 25 '14 18:11

oliver13


People also ask

How do I find the percentage of each category in R?

Find the percentage of each category in data frame frame(Group,DV) library(dplyr) df %>% group_by(Group) %>% summarise(Percentage=n()/nrow(.))

How do you find the percentage of data in R?

How to find the percentage of values that lie within a range in R data frame column? First of all, create a data frame. Then, use sum function along with extreme values for range and length function to find the percentage of values that lie within that range.

How do you count percentages in R studio?

To find the percentage of missing values in each column of an R data frame, we can use colMeans function with is.na function. This will find the mean of missing values in each column. After that we can multiply the output with 100 to get the percentage.


2 Answers

Per your comment, if the subgroups are unique you can do

library(dplyr) group_by(df, group) %>% mutate(percent = value/sum(value)) #   group subgroup value   percent # 1     A        a     1 0.1250000 # 2     A        b     4 0.5000000 # 3     A        c     2 0.2500000 # 4     A        d     1 0.1250000 # 5     B        a     1 0.1666667 # 6     B        b     2 0.3333333 # 7     B        c     3 0.5000000 

Or to remove the value column and add the percent column at the same time, use transmute

group_by(df, group) %>% transmute(subgroup, percent = value/sum(value)) #   group subgroup   percent # 1     A        a 0.1250000 # 2     A        b 0.5000000 # 3     A        c 0.2500000 # 4     A        d 0.1250000 # 5     B        a 0.1666667 # 6     B        b 0.3333333 # 7     B        c 0.5000000 
like image 81
Rich Scriven Avatar answered Sep 18 '22 17:09

Rich Scriven


We can use prop.table to calculate percentage/ratio.

Base R :

transform(df, percent = ave(value, group, FUN = prop.table))  #  group subgroup value percent #1     A        a     1   0.125 #2     A        b     4   0.500 #3     A        c     2   0.250 #4     A        d     1   0.125 #5     B        a     1   0.167 #6     B        b     2   0.333 #7     B        c     3   0.500 

dplyr :

library(dplyr) df %>% group_by(group) %>% mutate(percent = prop.table(value)) 

data.table :

library(data.table) setDT(df)[, percent := prop.table(value), group] 
like image 21
Ronak Shah Avatar answered Sep 17 '22 17:09

Ronak Shah