The basic statistics available for categorical variables are counts and percentages.
Since we need to add percentages in the labels of the Y-axis, the keyword “labels” is used. Now use scales: : percent to convert the y-axis labels into a percentage. This will scale the y-axis data from decimal to percentage. It simply multiplies the value by 100.
Since this was answered there have been some meaningful changes to the ggplot
syntax. Summing up the discussion in the comments above:
require(ggplot2)
require(scales)
p <- ggplot(mydataf, aes(x = foo)) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
## version 3.0.0
scale_y_continuous(labels=percent)
Here's a reproducible example using mtcars
:
ggplot(mtcars, aes(x = factor(hp))) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
scale_y_continuous(labels = percent) ## version 3.0.0
This question is currently the #1 hit on google for 'ggplot count vs percentage histogram' so hopefully this helps distill all the information currently housed in comments on the accepted answer.
Remark: If hp
is not set as a factor, ggplot returns:
this modified code should work
p = ggplot(mydataf, aes(x = foo)) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
scale_y_continuous(formatter = 'percent')
if your data has NAs and you dont want them to be included in the plot, pass na.omit(mydataf) as the argument to ggplot.
hope this helps.
With ggplot2 version 2.1.0 it is
+ scale_y_continuous(labels = scales::percent)
As of March 2017, with ggplot2
2.2.1 I think the best solution is explained in Hadley Wickham's R for data science book:
ggplot(mydataf) + stat_count(mapping = aes(x=foo, y=..prop.., group=1))
stat_count
computes two variables: count
is used by default, but you can choose to use prop
which shows proportions.
If you want percentages on the y-axis and labeled on the bars:
library(ggplot2)
library(scales)
ggplot(mtcars, aes(x = as.factor(am))) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
geom_text(aes(y = ((..count..)/sum(..count..)), label = scales::percent((..count..)/sum(..count..))), stat = "count", vjust = -0.25) +
scale_y_continuous(labels = percent) +
labs(title = "Manual vs. Automatic Frequency", y = "Percent", x = "Automatic Transmission")
When adding the bar labels, you may wish to omit the y-axis for a cleaner chart, by adding to the end:
theme(
axis.text.y=element_blank(), axis.ticks=element_blank(),
axis.title.y=element_blank()
)
Here is a workaround for faceted data. (The accepted answer by @Andrew does not work in this case.) The idea is to calculate the percentage value using dplyr and then to use geom_col to create the plot.
library(ggplot2)
library(scales)
library(magrittr)
library(dplyr)
binwidth <- 30
mtcars.stats <- mtcars %>%
group_by(cyl) %>%
mutate(bin = cut(hp, breaks=seq(0,400, binwidth),
labels= seq(0+binwidth,400, binwidth)-(binwidth/2)),
n = n()) %>%
group_by(cyl, bin) %>%
summarise(p = n()/n[1]) %>%
ungroup() %>%
mutate(bin = as.numeric(as.character(bin)))
ggplot(mtcars.stats, aes(x = bin, y= p)) +
geom_col() +
scale_y_continuous(labels = percent) +
facet_grid(cyl~.)
This is the plot:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With