I would like to create the following histogram (see image below) taken from the book "Think Stats". However, I cannot get them on the same plot. Each DataFrame takes its own subplot.
I have the following code:
import nsfg import matplotlib.pyplot as plt df = nsfg.ReadFemPreg() preg = nsfg.ReadFemPreg() live = preg[preg.outcome == 1] first = live[live.birthord == 1] others = live[live.birthord != 1] #fig = plt.figure() #ax1 = fig.add_subplot(111) first.hist(column = 'prglngth', bins = 40, color = 'teal', \ alpha = 0.5) others.hist(column = 'prglngth', bins = 40, color = 'blue', \ alpha = 0.5) plt.show()
The above code does not work when I use ax = ax1 as suggested in: pandas multiple plots not working as hists nor this example does what I need: Overlaying multiple histograms using pandas. When I use the code as it is, it creates two windows with histograms. Any ideas how to combine them?
Here's an example of how I'd like the final figure to look:
To make multiple overlapping histograms, we need to use Matplotlib pyplot's hist function multiple times. For example, to make a plot with two histograms, we need to use pyplot's hist() function two times. Here we adjust the transparency with alpha parameter and specify a label for each variable.
Bins are the buckets that your histogram will be grouped by. On the back end, Pandas will group your data into bins, or buckets. Then pandas will count how many values fell into that bucket, and plot the result.
For plotting two histograms together, we have to use hist() function separately with two datasets by giving some setting. Used to represent the label of the histogram it is of string type. Used for setting amount of transparency. Used to represent the name or label of the histogram.
As far as I can tell, pandas can't handle this situation. That's ok since all of their plotting methods are for convenience only. You'll need to use matplotlib directly. Here's how I do it:
%matplotlib inline import numpy as np import matplotlib.pyplot as plt import pandas #import seaborn #seaborn.set(style='ticks') np.random.seed(0) df = pandas.DataFrame(np.random.normal(size=(37,2)), columns=['A', 'B']) fig, ax = plt.subplots() a_heights, a_bins = np.histogram(df['A']) b_heights, b_bins = np.histogram(df['B'], bins=a_bins) width = (a_bins[1] - a_bins[0])/3 ax.bar(a_bins[:-1], a_heights, width=width, facecolor='cornflowerblue') ax.bar(b_bins[:-1]+width, b_heights, width=width, facecolor='seagreen') #seaborn.despine(ax=ax, offset=10)
And that gives me:
In case anyone wants to plot one histogram over another (rather than alternating bars) you can simply call .hist()
consecutively on the series you want to plot:
%matplotlib inline import numpy as np import matplotlib.pyplot as plt import pandas np.random.seed(0) df = pandas.DataFrame(np.random.normal(size=(37,2)), columns=['A', 'B']) df['A'].hist() df['B'].hist()
This gives you:
Note that the order you call .hist()
matters (the first one will be at the back)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With