I'm having an issue drawing a Pandas boxplot within a subplot. Based on the two ways I'm trying, creating the boxplot either removes all the subplots that I've already created, or plots the boxplot after the subplot grid. But I can't seem to draw it within the subplot grid.
import matplotlib.pyplot as plt
import pandas
from pandas import DataFrame, Series
data = {'day' : Series([1, 1, 1, 2, 2, 2, 3, 3, 3]),
'val' : Series([3, 4, 5, 6, 7, 8, 9, 10, 11])}
df = pandas.DataFrame(data)
The first thing I've tried is the following:
plt.figure()
plt.subplot(2, 2, 1)
plt.plot([1, 2, 3])
plt.subplot(2, 2, 4)
df.boxplot('val', 'day')
But this simply creates the plot outside of the subplots:
So, I then tried supplying the axis by hand:
plt.figure()
plt.subplot(2, 2, 1)
plt.plot([1, 2, 3])
plt.subplot(2, 2, 4)
ax = plt.gca()
df.boxplot('val', 'day', ax=ax)
But this simply destroyed the subplot grid all together, as well as the initial image:
Any ideas how I can get my boxplot image to appear in the bottom right grid in the subplots (the one that's empty in the first set of images)?
This appears to be a bug, or at least undesirable behavior, in the pandas plotting setup. What is going on is that if you supply a by
argument to boxplot
, pandas issues its own subplots
call, erasing any existing subplots. It apparently does this so that, if you want to plot more than one value, it will create subplots for each value (e.g., one boxplot for Y1 by day, another for Y2 by day, etc.).
However, what it seems like it should do, but it doesn't, is check to see if you're only plotting one value, and in that case, use the provided ax
object (if any) instead of making its own subplots. When you only plot one value, it creates a 1-by-1 subplots grid, which isn't very useful. Its logic is also a bit strange, as it creates a grid based on the number of columns you're plotting (the length of the first argument), but it only does this if you supply a by
argument. The intent seems to be to allow multi-box plots like df.boxplot(['col1', 'col2'])
, but in doing so it prevents your quite reasonable attempt to do df.boxplot('col1', 'grouper1')
.
I'd suggest raising an issue on the pandas bug tracker.
In the meantime, a somewhat hackish workaround is to do this:
df.pivot('val', 'day', 'val').boxplot(ax=ax)
This reshapes your data so that the group-by values (the days) are columns. The reshaped table has lots of NAs for val
values that don't occur with a particular day
value, but these NAs are ignored when plotting, so you get the right plot in the right subplot position.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With