Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Issue with Pandas boxplot within a subplot

Tags:

python

pandas

I'm having an issue drawing a Pandas boxplot within a subplot. Based on the two ways I'm trying, creating the boxplot either removes all the subplots that I've already created, or plots the boxplot after the subplot grid. But I can't seem to draw it within the subplot grid.

import matplotlib.pyplot as plt
import pandas
from pandas import DataFrame, Series

data = {'day' : Series([1, 1, 1, 2, 2, 2, 3, 3, 3]), 
        'val' : Series([3, 4, 5, 6, 7, 8, 9, 10, 11])}
df = pandas.DataFrame(data)

The first thing I've tried is the following:

plt.figure()

plt.subplot(2, 2, 1)
plt.plot([1, 2, 3])

plt.subplot(2, 2, 4)
df.boxplot('val', 'day')

But this simply creates the plot outside of the subplots:

Attempt Aenter image description here

So, I then tried supplying the axis by hand:

plt.figure()

plt.subplot(2, 2, 1)
plt.plot([1, 2, 3])

plt.subplot(2, 2, 4)
ax = plt.gca()
df.boxplot('val', 'day', ax=ax)

But this simply destroyed the subplot grid all together, as well as the initial image:

enter image description here

Any ideas how I can get my boxplot image to appear in the bottom right grid in the subplots (the one that's empty in the first set of images)?

like image 939
GeorgeLewis Avatar asked May 11 '13 17:05

GeorgeLewis


1 Answers

This appears to be a bug, or at least undesirable behavior, in the pandas plotting setup. What is going on is that if you supply a by argument to boxplot, pandas issues its own subplots call, erasing any existing subplots. It apparently does this so that, if you want to plot more than one value, it will create subplots for each value (e.g., one boxplot for Y1 by day, another for Y2 by day, etc.).

However, what it seems like it should do, but it doesn't, is check to see if you're only plotting one value, and in that case, use the provided ax object (if any) instead of making its own subplots. When you only plot one value, it creates a 1-by-1 subplots grid, which isn't very useful. Its logic is also a bit strange, as it creates a grid based on the number of columns you're plotting (the length of the first argument), but it only does this if you supply a by argument. The intent seems to be to allow multi-box plots like df.boxplot(['col1', 'col2']), but in doing so it prevents your quite reasonable attempt to do df.boxplot('col1', 'grouper1').

I'd suggest raising an issue on the pandas bug tracker.

In the meantime, a somewhat hackish workaround is to do this:

df.pivot('val', 'day', 'val').boxplot(ax=ax)

This reshapes your data so that the group-by values (the days) are columns. The reshaped table has lots of NAs for val values that don't occur with a particular day value, but these NAs are ignored when plotting, so you get the right plot in the right subplot position.

like image 167
BrenBarn Avatar answered Oct 03 '22 01:10

BrenBarn