Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple histograms in Pandas

I would like to create the following histogram (see image below) taken from the book "Think Stats". However, I cannot get them on the same plot. Each DataFrame takes its own subplot.

I have the following code:

import nsfg import matplotlib.pyplot as plt df = nsfg.ReadFemPreg() preg = nsfg.ReadFemPreg() live = preg[preg.outcome == 1]  first = live[live.birthord == 1] others = live[live.birthord != 1]  #fig = plt.figure() #ax1 = fig.add_subplot(111)  first.hist(column = 'prglngth', bins = 40, color = 'teal', \            alpha = 0.5) others.hist(column = 'prglngth', bins = 40, color = 'blue', \             alpha = 0.5) plt.show() 

The above code does not work when I use ax = ax1 as suggested in: pandas multiple plots not working as hists nor this example does what I need: Overlaying multiple histograms using pandas. When I use the code as it is, it creates two windows with histograms. Any ideas how to combine them?

Here's an example of how I'd like the final figure to look: enter image description here

like image 345
Rohit Avatar asked Aug 28 '14 01:08

Rohit


People also ask

How do I make a multiple histogram in Python?

To make multiple overlapping histograms, we need to use Matplotlib pyplot's hist function multiple times. For example, to make a plot with two histograms, we need to use pyplot's hist() function two times. Here we adjust the transparency with alpha parameter and specify a label for each variable.

What are bins in histogram pandas?

Bins are the buckets that your histogram will be grouped by. On the back end, Pandas will group your data into bins, or buckets. Then pandas will count how many values fell into that bucket, and plot the result.

How do you plot two histograms together?

For plotting two histograms together, we have to use hist() function separately with two datasets by giving some setting. Used to represent the label of the histogram it is of string type. Used for setting amount of transparency. Used to represent the name or label of the histogram.


2 Answers

As far as I can tell, pandas can't handle this situation. That's ok since all of their plotting methods are for convenience only. You'll need to use matplotlib directly. Here's how I do it:

%matplotlib inline import numpy as np import matplotlib.pyplot as plt import pandas #import seaborn #seaborn.set(style='ticks')  np.random.seed(0) df = pandas.DataFrame(np.random.normal(size=(37,2)), columns=['A', 'B']) fig, ax = plt.subplots()  a_heights, a_bins = np.histogram(df['A']) b_heights, b_bins = np.histogram(df['B'], bins=a_bins)  width = (a_bins[1] - a_bins[0])/3  ax.bar(a_bins[:-1], a_heights, width=width, facecolor='cornflowerblue') ax.bar(b_bins[:-1]+width, b_heights, width=width, facecolor='seagreen') #seaborn.despine(ax=ax, offset=10) 

And that gives me: enter image description here

like image 62
Paul H Avatar answered Sep 17 '22 10:09

Paul H


In case anyone wants to plot one histogram over another (rather than alternating bars) you can simply call .hist() consecutively on the series you want to plot:

%matplotlib inline import numpy as np import matplotlib.pyplot as plt import pandas   np.random.seed(0) df = pandas.DataFrame(np.random.normal(size=(37,2)), columns=['A', 'B'])  df['A'].hist() df['B'].hist() 

This gives you:

enter image description here

Note that the order you call .hist() matters (the first one will be at the back)

like image 44
lin_bug Avatar answered Sep 21 '22 10:09

lin_bug