I have a long dataset with one row per individual grouped with schools. Each row has an ordered factor {1, 2, 3, 4}, "cats". I want to get the percentage of 1's, 2's, 3's and 4's within each school. The dataset looks like this:
school_number cats
1 10505 3
2 10505 3
3 10502 1
4 10502 1
5 10502 2
6 10502 1
7 10502 1
8 10502 2
10 10503 3
11 10505 2
I tried something like this:
df_pcts <- df %>%
group_by(school_number) %>%
mutate(total=sum(table(cats))) %>%
summarize(cat_pct = table(cats)/total)
but the total variable produced by the mutate() step puts the grand total number of rows in every row. I can't even get to the final summarize step. I'm confused.
P.S. In some other posts I saw lines like this:
n = n()
when I do that I get a message saying,
Error in n() : This function should not be called directly
Where did this come from?
TIA
Perhaps this helps a little, though I'm not 100% sure of what output you need.
This counts the number of rows of each combination of school_number/cats that exist in your df using tally
. Then calculates the percentage of 'cats' in each school_number by then only grouping by school_number.
df %>%
group_by(school_number,cats) %>%
tally %>%
group_by(school_number) %>%
mutate(pct=(100*n)/sum(n))
It gives this:
# school_number cats n pct
# 1 10502 1 4 66.66667
# 2 10502 2 2 33.33333
# 3 10503 3 1 100.00000
# 4 10505 2 1 33.33333
# 5 10505 3 2 66.66667
EDIT:
to add in rows with 0% that are missing from your sample data, you could do the following. Bind together the output above with a df that contains 0% for all school_number/cats combinations. Only keep the first instance of this bind (first instances always containing values >0% should they exist). I then arranged it by school_number and cats for ease of reading:
y<-df %>%
group_by(school_number,cats) %>%
tally %>%
group_by(school_number) %>%
mutate(pct=(100*n)/sum(n)) %>%
select(-n)
x<-data.frame(school_number=rep(unique(df$school_number),each=4), cats=1:4,pct=0)
rbind(y,x) %>%
group_by(school_number,cats)%>%
filter(row_number() == 1) %>%
arrange(school_number,cats)
which gives:
# school_number cats pct
#1 10502 1 66.66667
#2 10502 2 33.33333
#3 10502 3 0.00000
#4 10502 4 0.00000
#5 10503 1 0.00000
#6 10503 2 0.00000
#7 10503 3 100.00000
#8 10503 4 0.00000
#9 10505 1 0.00000
#10 10505 2 33.33333
#11 10505 3 66.66667
#12 10505 4 0.00000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With