Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

normalized bar heights in ggplot

I am trying to compare two sets of count data with ggplot. The datasets are of different lengths and I am having trouble figuring out how to normalize the bar heights to the number of rows in each dataset. Please see the code examples below:

Example dataset

set.seed(47)
BG.restricted.hs = round(runif(100, min = 47, max = 1660380))
FG.hs = round(runif(1000, min = 0, max = 1820786))

dat = data.frame(x = c(BG.restricted.hs, FG.hs), 
             source = c(rep("BG", length(BG.restricted.hs)),
                        rep("FG", length(FG.hs))))
dat$bin = cut(dat$x, breaks = 200)

First attempt: no normalization. Bar heights are very different due to the dataset sizes!

ggplot(dat, aes(x = bin, fill = source)) +
    geom_bar(position = "identity", alpha = 0.2) +
    theme_bw() +
    scale_x_discrete(breaks = NULL)

Second attempt: Tried normalization with the ..count.. property

ggplot(dat,aes(x = bin, fill = source))+
    geom_bar(aes(y = ..count../sum(..count..)), alpha=0.5, position='identity')

This produced visually identical results with only the overall y axis scaled. It seems that ..count.. is not looking at the labels in the "source" column and I cannot seem to find a way to make it do so despite hours of experimenting. Is this possible?

like image 457
user3396385 Avatar asked Mar 16 '23 19:03

user3396385


2 Answers

stat_bin also returns density: density of points in bin, scaled to integrate to 1 so

ggplot(dat,aes(x = bin, fill = source)) + 
    stat_bin(aes(group=source, y=..density..))
like image 84
jaimedash Avatar answered Mar 19 '23 14:03

jaimedash


I believe this should do it. Setting the source as a group in the ggplot call:

ggplot(dat, aes(x = bin, y = ..density.., group = source, fill = source)) +
    geom_bar(alpha = 0.5, position = 'identity')

DensityPlot

like image 38
vpipkt Avatar answered Mar 19 '23 16:03

vpipkt