I have a dataset like this:
df = data.frame(group = c(rep('A',4), rep('B',3)), subgroup = c('a', 'b', 'c', 'd', 'a', 'b', 'c'), value = c(1,4,2,1,1,2,3)) group | subgroup | value ------------------------ A | a | 1 A | b | 4 A | c | 2 A | d | 1 B | a | 1 B | b | 2 B | c | 3
What I want is to get the percentage of the values of each subgroup within each group, i.e. the output should be:
group | subgroup | percent ------------------------ A | a | 0.125 A | b | 0.500 A | c | 0.250 A | d | 0.125 B | a | 0.167 B | b | 0.333 B | c | 0.500
Example for group A, subgroup A: the value was 1, the sum of the whole group A is 8 (a=1, b=4, c=2, d=1) - hence 1/8 = 0.125
So far I've only found fairly simple aggregates like this, but I cannot figure out how to do the "divide by a sum within a subgroup" part.
Find the percentage of each category in data frame frame(Group,DV) library(dplyr) df %>% group_by(Group) %>% summarise(Percentage=n()/nrow(.))
How to find the percentage of values that lie within a range in R data frame column? First of all, create a data frame. Then, use sum function along with extreme values for range and length function to find the percentage of values that lie within that range.
To find the percentage of missing values in each column of an R data frame, we can use colMeans function with is.na function. This will find the mean of missing values in each column. After that we can multiply the output with 100 to get the percentage.
Per your comment, if the subgroups are unique you can do
library(dplyr) group_by(df, group) %>% mutate(percent = value/sum(value)) # group subgroup value percent # 1 A a 1 0.1250000 # 2 A b 4 0.5000000 # 3 A c 2 0.2500000 # 4 A d 1 0.1250000 # 5 B a 1 0.1666667 # 6 B b 2 0.3333333 # 7 B c 3 0.5000000
Or to remove the value
column and add the percent
column at the same time, use transmute
group_by(df, group) %>% transmute(subgroup, percent = value/sum(value)) # group subgroup percent # 1 A a 0.1250000 # 2 A b 0.5000000 # 3 A c 0.2500000 # 4 A d 0.1250000 # 5 B a 0.1666667 # 6 B b 0.3333333 # 7 B c 0.5000000
We can use prop.table
to calculate percentage/ratio.
Base R :
transform(df, percent = ave(value, group, FUN = prop.table)) # group subgroup value percent #1 A a 1 0.125 #2 A b 4 0.500 #3 A c 2 0.250 #4 A d 1 0.125 #5 B a 1 0.167 #6 B b 2 0.333 #7 B c 3 0.500
dplyr
:
library(dplyr) df %>% group_by(group) %>% mutate(percent = prop.table(value))
data.table
:
library(data.table) setDT(df)[, percent := prop.table(value), group]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With