I would appreciate any insight with the following.
I want to plot two datasets on one common histogram such that both histograms do not have their tops cut-off and have probability distributions ranging from 0 to 1.
Let me explain what I mean. So far, I can plot the two datasets on one histogram nicely and force the integral of both distributions to be 1 by writing normed = 1 in ax.hist(), as seen in the following figure:
and which is produced from code like this:
x1, w1, patches1 = ax.hist(thing1, bins=300, edgecolor='b', color='b', histtype='stepfilled', alpha=0.2, normed = 1)
x2, w2, patches2 = ax.hist(thing2, bins=300, edgecolor='g', color='g', histtype='stepfilled', alpha=0.2, normed = 1)
In the general case, one probability distribution is much higher than the other and it makes it hard to read the plot clearly.
So, I've tried to normalise both such that they would both range from 0 to 1 on the y axis and still preserve their shape. For example, I've tried the following code:
for item in patches1:
item.set_height(item.get_height()/sum(x1))
which is taken from the discussion here How to normalize a histogram in python?, but python throws me an error message saying there is no such quality as get_height.
My question is simply: How can I have it that so that the y axis ranges from 0 to 1 and preserve the shape of both distributions?
I would recommend to pre-compute the histograms using numpy and then plot them in matplotlib using bar. The histogram can then simply be normalized (by amplitude) by dividing by the maximum amplitude of each histogram. Note that, in order to get any kind of meaningful comparison between the two histograms, it is best to use the same bins for both of them. Below an example how to do this:
from matplotlib import pyplot as plt
import numpy as np
##some random distribution
dist1 = np.random.normal(0.5, 0.25, 1000)
dist2 = np.random.normal(0.8, 0.1, 1000)
##computing the bin properties (same for both distributions)
num_bin = 50
bin_lims = np.linspace(0,1,num_bin+1)
bin_centers = 0.5*(bin_lims[:-1]+bin_lims[1:])
bin_widths = bin_lims[1:]-bin_lims[:-1]
##computing the histograms
hist1, _ = np.histogram(dist1, bins=bin_lims)
hist2, _ = np.histogram(dist2, bins=bin_lims)
##normalizing
hist1b = hist1/np.max(hist1)
hist2b = hist2/np.max(hist2)
fig, (ax1,ax2) = plt.subplots(nrows = 1, ncols = 2)
ax1.bar(bin_centers, hist1, width = bin_widths, align = 'center')
ax1.bar(bin_centers, hist2, width = bin_widths, align = 'center', alpha = 0.5)
ax1.set_title('original')
ax2.bar(bin_centers, hist1b, width = bin_widths, align = 'center')
ax2.bar(bin_centers, hist2b, width = bin_widths, align = 'center', alpha = 0.5)
ax2.set_title('ampllitude-normalized')
plt.show()
And a picture of how this looks like:

Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With