library(ggplot2)
data = diamonds[, c('carat', 'color')]
data = data[data$color %in% c('D', 'E'), ]
I would like to compare the histogram of carat across color D and E, and use the classwise percentage on the y-axis. The solutions I have tried are as follows:
Solution 1:
ggplot(data=data, aes(carat, fill=color)) + geom_bar(aes(y=..density..), position='dodge', binwidth = 0.5) + ylab("Percentage") +xlab("Carat")
This is not quite right since the y-axis shows the height of the estimated density.
Solution 2:
ggplot(data=data, aes(carat, fill=color)) + geom_histogram(aes(y=(..count..)/sum(..count..)), position='dodge', binwidth = 0.5) + ylab("Percentage") +xlab("Carat")
This is also not I want, because the denominator used to calculate the ratio on the y-axis are the total count of D + E.
Is there a way to display the classwise percentages with ggplot2's stacked histogram? That is, instead of showing (# of obs in bin)/count(D+E) on y axis, I would like it to show (# of obs in bin)/count(D) and (# of obs in bin)/count(E) respectively for two color classes. Thanks.
You can also make histograms by using ggplot2 , “a plotting system for R, based on the grammar of graphics” that was created by Hadley Wickham. This post will focus on making a Histogram With ggplot2.
To change the number of bins in the histogram using the ggplot2 package library in the R Language, we use the bins argument of the geom_histogram() function. The bins argument of the geom_histogram() function to manually set the number of bars, cells, or bins the whole histogram will be divided into.
This R tutorial describes how to create a histogram plot using R software and ggplot2 package. The function geom_histogram() is used. You can also add a line for the mean using the function geom_vline.
Since we need to add percentages in the labels of the Y-axis, the keyword “labels” is used. Now use scales: : percent to convert the y-axis labels into a percentage. This will scale the y-axis data from decimal to percentage. It simply multiplies the value by 100.
When I tried Rorschach's answer it wasn't working for me for reasons that weren't readily apparent but I wanted to comment to say if you were open to adding density lines to a histogram once you do that it will automatically change the y axis to percent.
For example I have a count of "doses" by a binary outcome (0,1)
this code produces the following graph:
ggplot(data, aes(x=siadoses, fill=recallbin, color=recallbin)) +
geom_histogram(binwidth=1, alpha=.5, position='identity')
But when I include a density plot to my ggplot code and add y=..density.. I get this plot with percent on the Y
ggplot(data, aes(x=siadoses, fill=recallbin, color=recallbin)) +
geom_histogram(aes(y=..density..), binwidth=1, alpha=.5, position='identity') +
geom_density(alpha=.2)
kind of a work around to your original question but thought I would share.
It seems that binning the data outside of ggplot2 is the way to go. But I would still be interested to see if there is a way to do it with ggplot2.
library(dplyr)
breaks = seq(0,4,0.5)
data$carat_cut = cut(data$carat, breaks = breaks)
data_cut = data %>%
group_by(color, carat_cut) %>%
summarise (n = n()) %>%
mutate(freq = n / sum(n))
ggplot(data=data_cut, aes(x = carat_cut, y=freq*100, fill=color)) + geom_bar(stat="identity",position="dodge") + scale_x_discrete(labels = breaks) + ylab("Percentage") +xlab("Carat")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With