I have a dataset like this: <pre class="prettyprint"><code>df = data.frame(group = c(rep('A',4), rep('B',3)), subgroup = c('a', 'b', 'c', 'd', 'a', 'b', 'c'), value = c(1,4,2,1,1,2,3)) group | subgroup | value ------------------------ A | a | 1 A | b | 4 A | c | 2 A | d | 1 B | a | 1 B | b | 2 B | c | 3 </code></pre> What I want is to get the percentage of the values of each subgroup within each group, i.e. the output should be: <pre class="prettyprint"><code>group | subgroup | percent ------------------------ A | a | 0.125 A | b | 0.500 A | c | 0.250 A | d | 0.125 B | a | 0.167 B | b | 0.333 B | c | 0.500 </code></pre> Example for group A, subgroup A: the value was 1, the sum of the whole group A is 8 (a=1, b=4, c=2, d=1) - hence 1/8 = 0.125 So far I've only found fairly simple aggregates like this, but I cannot figure out how to do the "divide by a sum within a subgroup" part.

Per your comment, if the subgroups are unique you can do <pre class="prettyprint"><code>library(dplyr) group_by(df, group) %>% mutate(percent = value/sum(value)) # group subgroup value percent # 1 A a 1 0.1250000 # 2 A b 4 0.5000000 # 3 A c 2 0.2500000 # 4 A d 1 0.1250000 # 5 B a 1 0.1666667 # 6 B b 2 0.3333333 # 7 B c 3 0.5000000 </code></pre> Or to remove the <code>value</code> column and add the <code>percent</code> column at the same time, use <code>transmute</code> <pre class="prettyprint"><code>group_by(df, group) %>% transmute(subgroup, percent = value/sum(value)) # group subgroup percent # 1 A a 0.1250000 # 2 A b 0.5000000 # 3 A c 0.2500000 # 4 A d 0.1250000 # 5 B a 0.1666667 # 6 B b 0.3333333 # 7 B c 0.5000000 </code></pre>

We can use <code>prop.table</code> to calculate percentage/ratio. Base R : <pre class="prettyprint"><code>transform(df, percent = ave(value, group, FUN = prop.table)) # group subgroup value percent #1 A a 1 0.125 #2 A b 4 0.500 #3 A c 2 0.250 #4 A d 1 0.125 #5 B a 1 0.167 #6 B b 2 0.333 #7 B c 3 0.500 </code></pre> <code>dplyr</code> : <pre class="prettyprint"><code>library(dplyr) df %>% group_by(group) %>% mutate(percent = prop.table(value)) </code></pre> <code>data.table</code> : <pre class="prettyprint"><code>library(data.table) setDT(df)[, percent := prop.table(value), group] </code></pre>

Summarizing by subgroup percentage in R

Tags:

r

aggregate

percentage

I have a dataset like this:

df = data.frame(group = c(rep('A',4), rep('B',3)),                 subgroup = c('a', 'b', 'c', 'd', 'a', 'b', 'c'),                 value = c(1,4,2,1,1,2,3))   group | subgroup | value ------------------------   A   |    a     |  1   A   |    b     |  4   A   |    c     |  2   A   |    d     |  1   B   |    a     |  1   B   |    b     |  2   B   |    c     |  3

What I want is to get the percentage of the values of each subgroup within each group, i.e. the output should be:

group | subgroup | percent ------------------------   A   |    a     |  0.125   A   |    b     |  0.500   A   |    c     |  0.250   A   |    d     |  0.125   B   |    a     |  0.167   B   |    b     |  0.333   B   |    c     |  0.500

Example for group A, subgroup A: the value was 1, the sum of the whole group A is 8 (a=1, b=4, c=2, d=1) - hence 1/8 = 0.125

So far I've only found fairly simple aggregates like this, but I cannot figure out how to do the "divide by a sum within a subgroup" part.

685

asked Nov 25 '14 18:11

oliver13

2 Answers

Per your comment, if the subgroups are unique you can do

library(dplyr) group_by(df, group) %>% mutate(percent = value/sum(value)) #   group subgroup value   percent # 1     A        a     1 0.1250000 # 2     A        b     4 0.5000000 # 3     A        c     2 0.2500000 # 4     A        d     1 0.1250000 # 5     B        a     1 0.1666667 # 6     B        b     2 0.3333333 # 7     B        c     3 0.5000000

Or to remove the value column and add the percent column at the same time, use transmute

group_by(df, group) %>% transmute(subgroup, percent = value/sum(value)) #   group subgroup   percent # 1     A        a 0.1250000 # 2     A        b 0.5000000 # 3     A        c 0.2500000 # 4     A        d 0.1250000 # 5     B        a 0.1666667 # 6     B        b 0.3333333 # 7     B        c 0.5000000

answered Sep 18 '22 17:09

Rich Scriven

We can use prop.table to calculate percentage/ratio.

Base R :

transform(df, percent = ave(value, group, FUN = prop.table))  #  group subgroup value percent #1     A        a     1   0.125 #2     A        b     4   0.500 #3     A        c     2   0.250 #4     A        d     1   0.125 #5     B        a     1   0.167 #6     B        b     2   0.333 #7     B        c     3   0.500

dplyr :

library(dplyr) df %>% group_by(group) %>% mutate(percent = prop.table(value))

data.table :

library(data.table) setDT(df)[, percent := prop.table(value), group]

answered Sep 17 '22 17:09

Ronak Shah

Related questions
                            
                                RStudio enters debug mode for every function error - how can I stop it?
                            
                                Why is using assign bad?
                            
                                Use data.table to count and aggregate / summarize a column
                            
                                matplotlib analog of R's `pairs`
                            
                                is it possible to redirect console output to a variable?
                            
                                How to include NA in ifelse?
                            
                                Adjusting width of tables made with kable() in RMarkdown documents
                            
                                using parallel's parLapply: unable to access variables within parallel code
                            
                                Fast reading and combining several files using data.table (with fread)
                            
                                Multiply many columns by a specific other column in R with data.table?
                            
                                meaning of ddply error: 'names' attribute [9] must be the same length as the vector [1]
                            
                                Convert four digit year values to class Date
                            
                                How to extract the first n rows per group?
                            
                                How to merge two data.table by different column names?
                            
                                installation path not writable R, unable to update packages
                            
                                Apply multiple functions to multiple columns in data.table
                            
                                Add objects to package namespace
                            
                                How to round up to whole number in R?
                            
                                R convert matrix or data frame to sparseMatrix
                            
                                Code to import data from a Stack overflow query into R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With