Is there a idiomatic way to plot the histogram of a feature for two classes? In pandas, I basically want
df.feature[df.class == 0].hist() df.feature[df.class == 1].hist()
To be in the same plot. I could do
df.feature.hist(by=df.class)
but that gives me two separate plots.
This seems to be a common task so I would imagine there to be an idiomatic way to do this. Of course I could manipulate the histograms manually to fit next to each other but usually pandas does that quite nicely.
Basically I want this matplotlib example in one line of pandas: http://matplotlib.org/examples/pylab_examples/barchart_demo.html
I thought I was missing something, but maybe it is not possible (yet).
In order to plot a histogram using pandas, chain the . hist() function to the dataframe. This will return the histogram for each numeric column in the pandas dataframe.
Tweaking Matplotlib Preferably, one that has tick mark and other features closer to the aesthetic you want to achieve. Turn the frame and grid lines off. Tweak the x-axis so that there is a gap with the y-axis, which seems more appropriate for histograms. Have color options allowing for separation between bins.
How about df.groupby("class").feature.hist()
? To see overlapping distributions you'll probably need to pass alpha=0.4
to hist()
. Alternatively, I'd be tempted to use a kernel density estimate instead of a histogram with df.groupby("class").feature.plot(kind='kde')
.
As an example, I plotted the iris dataset's classes using:
iris.groupby("Name").PetalWidth.plot(kind='kde', ax=axs[1]) iris.groupby("Name").PetalWidth.hist(alpha=0.4, ax=axs[0])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With