Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Condition a ..count.. summation on the faceting variable

Tags:

r

ggplot2

I'm trying to annotate a bar chart with the percentage of observations falling into that bucket, within a facet. This question is very closely related to this question: Show % instead of counts in charts of categorical variables but the introduction of faceting introduces a wrinkle. The answer to the related question is to use stat_bin w/ the text geom and then have the label be constructed as so:

 stat_bin(geom="text", aes(x = bins,
         y = ..count..,
         label = paste(round(100*(..count../sum(..count..)),1), "%", sep="")
         )

This works fine for an un-faceted plot. However, with facets, this sum(..count..) is summing over the entire collection of observations without regard for the facets. The plot below illustrates the issue---note that the percentages do not sum to 100% within a panel.

enter image description here

Here the actually code for the figure above:

 g.invite.distro <- ggplot(data = df.exp) +
 geom_bar(aes(x = invite_bins)) +
 facet_wrap(~cat1, ncol=3) +
 stat_bin(geom="text", aes(x = invite_bins,
         y = ..count..,
         label = paste(round(100*(..count../sum(..count..)),1), "%", sep="")
         ),  
         vjust = -1, size = 3) +
  theme_bw() + 
scale_y_continuous(limits = c(0, 3000))

UPDATE: As per request, here's a small example re-producing the issue:

df <- data.frame(x = c('a', 'a', 'b','b'), f = c('c', 'd','d','d'))
ggplot(data = df) + geom_bar(aes(x = x)) +
 stat_bin(geom = "text", aes(
         x = x,
         y = ..count.., label = ..count../sum(..count..)), vjust = -1) +
 facet_wrap(~f)

enter image description here

like image 629
John Horton Avatar asked Jul 19 '12 18:07

John Horton


1 Answers

Update geom_bar requires stat = identity.

Sometimes it's easier to obtain summaries outside the call to ggplot.

df <- data.frame(x = c('a', 'a', 'b','b'), f = c('c', 'd','d','d'))

# Load packages
library(ggplot2)
library(plyr)

# Obtain summary. 'Freq' is the count, 'pct' is the percent within each 'f'
m = ddply(data.frame(table(df)), .(f), mutate, pct = round(Freq/sum(Freq) * 100, 1)) 

# Plot the data using the summary data frame
ggplot(data = m, aes(x = x, y = Freq)) + 
   geom_bar(stat = "identity", width = .7) +
   geom_text(aes(label = paste(m$pct, "%", sep = "")), vjust = -1, size = 3) +
   facet_wrap(~ f, ncol = 2) + theme_bw() +
   scale_y_continuous(limits = c(0, 1.2*max(m$Freq)))

enter image description here

like image 185
Sandy Muspratt Avatar answered Nov 15 '22 20:11

Sandy Muspratt