R Normalize then plot two histograms together in R

Tags:

I realize there have been several posts for people asking how to plot two histograms together side by side (as in one plot with the bars next to each other) and overlaid in R and also on how to normalize data. Following the advice that I've found, I'm able to do one or the other, but not both operations.

Here's the setup. I have two data frames of different lengths and would like to plot the volume of the objects in each df as a histogram. Eg how many in data frame 1 are between .1-.2 um^3 and compare it with how many in data frame 2 are between .1 and .2 um^3 and so on. Overlaid or Side by Side would be great to do this.

Since there are more measurements in one data frame than the other, obviously I have to normalize, so I use:

read.csv(ctl)
read.csv(exp)
h1=hist(ctl$Volume....)
h2=hist(exp$Volume....

#to normalize#

h1$density=h1$counts/sum(h1$counts)*100
plot(h1,freq=FALSE....)
h2$density=h2$counts/sum(h2$counts)*100
plot(h2,freq=FALSE....)

Now I've been successful overlaying the un-normalized data using this method: http://www.r-bloggers.com/overlapping-histogram-in-r/ and also with this method: plotting two histograms together

but I'm stuck when it comes to how to overlay normalized data

561

asked Mar 26 '15 19:03

Harry B

1 Answers

ggplot2 makes it relatively straightforward to plot normalized histograms of groups with unequal size. Here's an example with fake data:

library(ggplot2)

# Fake data (two normal distributions)
set.seed(20)
dat1 = data.frame(x=rnorm(1000, 100, 10), group="A")
dat2 = data.frame(x=rnorm(2000, 120, 20), group="B")
dat = rbind(dat1, dat2)

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_histogram(breaks=seq(0,200,5), alpha=0.6, 
                 position="identity", lwd=0.2) +
  ggtitle("Unormalized")

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_histogram(aes(y=..density..), breaks=seq(0,200,5), alpha=0.6, 
                 position="identity", lwd=0.2) +
  ggtitle("Normalized")

enter image description here

If you want to make overlayed density plots, you can do that as well. adjust controls the bandwidth. This is already normalized by default.

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_density(alpha=0.4, lwd=0.8, adjust=0.5)

enter image description here

UPDATE: In answer to your comment, the following code should do it. (..density..)/sum(..density..) results in the total density over the two histograms adding up to one, and the total density of each individual group adding up to 0.5. So you have multiply by 2 in order for the total density of each group to be individually normalized to 1. In general, you have to multiply by n, where n is the number of groups. This seems kind of kludgy and there may be a more elegant approach.

library(scales) # For percent_format()

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_histogram(aes(y=2*(..density..)/sum(..density..)), breaks=seq(0,200,5), alpha=0.6, 
                 position="identity", lwd=0.2) +
  scale_y_continuous(labels=percent_format())

enter image description here

answered Oct 21 '22 18:10

eipi10

Related questions
                            
                                How can I use R (Rcurl/XML packages ?!) to scrape this webpage?
                            
                                Suggestion for R/LaTeX table creation package
                            
                                Plot to specific plot in multiple-plot window?
                            
                                display values in stacked lattice barchart
                            
                                How do I merge a large list of xts objects via loop / function in R?
                            
                                R: as.numeric function not returning correct # from data.frame [duplicate]
                            
                                R: Reversing the data in a time series object
                            
                                Merge data sets by row differening columns [duplicate]
                            
                                What are the suggested practices for function polymorphism in R?
                            
                                R - setting up my own CRAN repository
                            
                                How to weight smoothing by arbitrary factor in ggplot2?
                            
                                Example of Time Series Prediction using Neural Networks in R
                            
                                how do I select the smoothing parameter for smooth.spline()?
                            
                                Function parameter as argument in an R function
                            
                                run R script from .bat (batch file)
                            
                                Tick labels in ggplot2 bar graph
                            
                                dplyr: colSums on sub-grouped (group_by) data frames: elegantly
                            
                                Lazy Evaluation: Why can't I use plot(..., xlim = c(0,1), ylim = xlim)?
                            
                                read.table reads numbers as factors
                            
                                setNames equivalent for colnames and rownames for use in pipe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R Normalize then plot two histograms together in R

Tags:

plot

r

histogram

normalization

Harry B

People also ask

1 Answers

eipi10

Recent Activity

Donate For Us