Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding group mean lines to geom_bar plot and including in legend

I want to be able to create a bar graph which shows also shows the mean value for bars in each group. AND shows the mean bar in the legend.

I have been able to get this graph Bar chart with means using the code below, which is fine, but I would like to be able to see the mean lines in the legend.

enter image description here

##The data to be graphed is the proportion of persons receiving a treatment
## (num=numerator) in each population (denom=demoninator). The population is 
##grouped by two age groups and (Age) and further divided by a categorical 
##variable V1

###SET UP DATAFRAME###
require(ggplot2)    
df <- data.frame(V1 = c(rep(c("S1","S2","S3","S4","S5"),2)), 
               Age= c(rep(70,5),rep(80,5)), 
               num=c(5280,6570,5307,4894,4119,3377,4244,2999,2971,2322),
               denom=c(9984,12600,9425,8206,7227,7290,8808,6386,6206,5227))

df$prop<-df$num/df$denom*100

PopMean<-sum(df$num)/sum(df$denom)*100

df70<-df[df$Age==70,]
group70mean<-sum(df70$num)/sum(df70$denom)*100

df80<-df[df$Age==80,]
group80mean<-sum(df80$num)/sum(df80$denom)*100

df$PopMean<-c(rep(PopMean,10))
df$groupmeans<-c(rep(group70mean,5),rep(group80mean,5))

I want the plot to look like this, but want the lines in the legend too, to be labelled as 'mean of group' or similar.

 #basic plot
 P<-ggplot(df, aes(x=factor(Age), y=prop, fill=factor(V1))) +
   geom_bar(position=position_dodge(), colour='black',stat="identity")    

 P
####add mean lines    
P+geom_errorbar(aes(y=df$groupmeans, ymax=df$groupmeans, 
ymin=df$groupmeans), col="red", lwd=2)

Adding show.legend=TRUE overlays the error bars onto the factor legend, rather than separately. If there is a way of showing geom_errorbar separately in the legend this is probably the simplest solution.

I have also tried various things with geom_line The syntax below produces a line for the population mean value, but running from the centre of each point rather than covering the width of the bars This produces a line for the population mean and it does produce a legend but one showing a bar of colour rather than a line.

P+geom_line(aes(y=df$PopMean, group=df$PopMean, color=df$PopMean),lwd=1)

If i try to do lines for group means the lines are not visible (because they are only single points).

P+geom_line(aes(y=df$groupmeans, group=df$groupmeans, color=df$groupmeans))

I also tried to get round this with facet plot, although this requires me to pretend my categorical variable is numeric to get it to work.

###set up new df
df2<-df
df2$V1<-c(rep(c(1,2,3,4,5),2))

P<-ggplot(df2, aes(x=factor(V1), y=prop, fill=factor(V1))) +
  geom_bar(position=position_dodge(),     
  colour='black',stat="identity",width=1)

P+facet_grid(.~factor(df2$Age))

P+facet_grid(.~factor(df2$Age))+geom_line(aes(y=df$groupmeans, 
group=df$groupmeans, color=df$groupmeans))

Facetplot

enter image description here

This allows me to show the mean lines, using geom_line, so a legend does appear (although it doesn't look right, showing a colour gradient rather than coloured lines!). However, the lines still do not go the full width of the bars. Also my x-axis now needs relabelling to show S1, S2 etc rather than numeric 1,2,3

To sum up - is there a way of showing error bar lines separately in the legend?

If not, then, if i use facetting, how do I correct the legend appearance and relabel axes with my categorical variables and is is possible to get the line to go the full width of the plot?

Or is there an alternate solution that I am missing!?

Thanks

like image 512
ekmz Avatar asked Jan 10 '17 13:01

ekmz


People also ask

WHAT IS group in Geom_bar?

group="whatever" is a "dummy" grouping to override the default behavior, which (here) is to group by cut and in general is to group by the x variable. The default for geom_bar is to group by the x variable in order to separately count the number of rows in each level of the x variable.

What is the difference between using Geom_bar () and Geom_bar Stat identity )?

By default, geom_bar uses stat="bin". This makes the height of each bar equal to the number of cases in each group, and it is incompatible with mapping values to the y aesthetic. If you want the heights of the bars to represent values in the data, use stat="identity" and map a value to the y aesthetic."

What does Geom_col () do?

Basically, geom_col is a wrapper over the geom_bar geometry, which has statically defined the statistical transformation to identity. This means that the values for positional parameters x and y are mapped directly to variables from the selected dataset.


1 Answers

To get the legend for the geom_error you need to pass the colour argument in the aes. As you want only one category (here red), I've create a dummy variable first

df$mean <- "Mean"
ggplot(df, aes(x=factor(Age), y=prop, fill=factor(V1))) +
  geom_bar(position=position_dodge(), colour='black',stat="identity") +
  geom_errorbar(aes (ymax=groupmeans, 
                ymin=groupmeans, colour=mean), lwd=2) +
  scale_colour_manual(name="",values = "#ff0000")

enter image description here

like image 127
timat Avatar answered Sep 27 '22 01:09

timat