I would like to compare two histograms by having the Y axis show the percentage of each column from the overall dataset size instead of an absolute value. Is that possible? I am using Pandas and matplotlib. Thanks
The density=True
(normed=True
for matplotlib < 2.2.0
) returns a histogram for which np.sum(pdf * np.diff(bins))
equals 1. If you want the sum of the histogram to be 1 you can use Numpy's histogram() and normalize the results yourself.
x = np.random.randn(30) fig, ax = plt.subplots(1,2, figsize=(10,4)) ax[0].hist(x, density=True, color='grey') hist, bins = np.histogram(x) ax[1].bar(bins[:-1], hist.astype(np.float32) / hist.sum(), width=(bins[1]-bins[0]), color='grey') ax[0].set_title('normed=True') ax[1].set_title('hist = hist / hist.sum()')
Btw: Strange plotting glitch at the first bin of the left plot.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With