I have a dataset like this:
df = data.frame(group = c(rep('A',4), rep('B',3)), subgroup = c('a', 'b', 'c', 'd', 'a', 'b', 'c'), value = c(1,4,2,1,1,2,3)) group | subgroup | value ------------------------ A | a | 1 A | b | 4 A | c | 2 A | d | 1 B | a | 1 B | b | 2 B | c | 3 What I want is to get the percentage of the values of each subgroup within each group, i.e. the output should be:
group | subgroup | percent ------------------------ A | a | 0.125 A | b | 0.500 A | c | 0.250 A | d | 0.125 B | a | 0.167 B | b | 0.333 B | c | 0.500 Example for group A, subgroup A: the value was 1, the sum of the whole group A is 8 (a=1, b=4, c=2, d=1) - hence 1/8 = 0.125
So far I've only found fairly simple aggregates like this, but I cannot figure out how to do the "divide by a sum within a subgroup" part.
Find the percentage of each category in data frame frame(Group,DV) library(dplyr) df %>% group_by(Group) %>% summarise(Percentage=n()/nrow(.))
How to find the percentage of values that lie within a range in R data frame column? First of all, create a data frame. Then, use sum function along with extreme values for range and length function to find the percentage of values that lie within that range.
To find the percentage of missing values in each column of an R data frame, we can use colMeans function with is.na function. This will find the mean of missing values in each column. After that we can multiply the output with 100 to get the percentage.
Per your comment, if the subgroups are unique you can do
library(dplyr) group_by(df, group) %>% mutate(percent = value/sum(value)) # group subgroup value percent # 1 A a 1 0.1250000 # 2 A b 4 0.5000000 # 3 A c 2 0.2500000 # 4 A d 1 0.1250000 # 5 B a 1 0.1666667 # 6 B b 2 0.3333333 # 7 B c 3 0.5000000 Or to remove the value column and add the percent column at the same time, use transmute
group_by(df, group) %>% transmute(subgroup, percent = value/sum(value)) # group subgroup percent # 1 A a 0.1250000 # 2 A b 0.5000000 # 3 A c 0.2500000 # 4 A d 0.1250000 # 5 B a 0.1666667 # 6 B b 0.3333333 # 7 B c 0.5000000
We can use prop.table to calculate percentage/ratio.
Base R :
transform(df, percent = ave(value, group, FUN = prop.table)) # group subgroup value percent #1 A a 1 0.125 #2 A b 4 0.500 #3 A c 2 0.250 #4 A d 1 0.125 #5 B a 1 0.167 #6 B b 2 0.333 #7 B c 3 0.500 dplyr :
library(dplyr) df %>% group_by(group) %>% mutate(percent = prop.table(value)) data.table :
library(data.table) setDT(df)[, percent := prop.table(value), group]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With