Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot2 - Multi-group histogram with in-group proportions rather than frequency

Tags:

I have three cohorts of students identified by an ExperimentCohort factor. For each student, I have a LetterGrade, also a factor. I'd like to plot a histogram-like bar graph of LetterGrade for each ExperimentCohort. Using

ggplot(df, alpha = 0.2,         aes(x = LetterGrade, group = ExperimentCohort, fill = ExperimentCohort))                                                                                                                                                          + geom_bar(position = "dodge") 

gets me very close, but the three ExperimentCohorts don't have the same number of students. To compare these on a more even field, I'd like the y-axis to be the in-cohort proportion of each letter-grade. So far, short of calculating this proportion and putting it in a separate dataframe before plotting, I have not been able to find a way to do this.

Every solution to a similar question on SO and elsewhere involves aes(y = ..count../sum(..count..)), but sum(..count..) is executed across the whole dataframe rather than within each cohort. Anyone got a suggestion? Here's code to create an example dataframe:

df <- data.frame(ID = 1:60,          LetterGrade = sample(c("A", "B", "C", "D", "E", "F"), 60, replace = T),         ExperimentCohort = sample(c("One", "Two", "Three"), 60, replace = T)) 

Thanks.

like image 357
Claire Sannier Avatar asked Jun 28 '13 15:06

Claire Sannier


People also ask

What are the applications of Geom_freqpoly and geom_histogram in R?

Histograms ( geom_histogram() ) display the counts with bars; frequency polygons ( geom_freqpoly() ) display the counts with lines. Frequency polygons are more suitable when you want to compare the distribution across the levels of a categorical variable.

How do I plot a histogram in a group in R?

In order to create a histogram by group in ggplot2 you will need to input the numerical and the categorical variable inside aes and use geom_histogram as follows. You can also set the categorical variable to the colour argument, so the border lines of each histogram will have a different color.

How do you make a histogram with two sets of data in R?

Plot two histograms Using plot() will simply plot the histogram as if you'd typed hist() from the start. However, you can now use add = TRUE as a parameter, which allows a second histogram to be plotted on the same chart/axis.

Can you build a histogram using ggplot2?

You can also make histograms by using ggplot2 , “a plotting system for R, based on the grammar of graphics” that was created by Hadley Wickham. This post will focus on making a Histogram With ggplot2.


2 Answers

Wrong solution

You can use stat_bin() and y=..density.. to get percentages in each group.

ggplot(df, alpha = 0.2,       aes(x = LetterGrade, group = ExperimentCohort, fill = ExperimentCohort))+       stat_bin(aes(y=..density..), position='dodge') 

UPDATE - correct solution

As pointed out by @rpierce y=..density.. will calculate density values for each group not the percentages (they are not the same).

To get the correct solution with percentages one way is to calculate them before plotting. For this used function ddply() from library plyr. In each ExperimentCohort calculated proportions using functions prop.table() and table() and saved them as prop. With names() and table() got back LetterGrade.

df.new<-ddply(df,.(ExperimentCohort),summarise,               prop=prop.table(table(LetterGrade)),               LetterGrade=names(table(LetterGrade)))   head(df.new)   ExperimentCohort       prop LetterGrade 1              One 0.21739130           A 2              One 0.08695652           B 3              One 0.13043478           C 4              One 0.13043478           D 5              One 0.30434783           E 6              One 0.13043478           F 

Now use this new data frame for plotting. As proportions are already calculated - provided them as y values and added stat="identity" inside the geom_bar.

ggplot(df.new,aes(LetterGrade,prop,fill=ExperimentCohort))+   geom_bar(stat="identity",position='dodge') 

enter image description here

like image 151
Didzis Elferts Avatar answered Sep 28 '22 08:09

Didzis Elferts


You can also do this by creating a weight column that sums to 1 for each group:

ggplot(df %>%          group_by(ExperimentCohort) %>%          mutate(weight = 1 / n()),        aes(x = LetterGrade, fill = ExperimentCohort)) +   geom_histogram(aes(weight = weight), stat = 'count', position = 'dodge') 
like image 25
sirallen Avatar answered Sep 28 '22 10:09

sirallen