I want to place a series of (matplotlib) boxplots in a time axis. They are series of measurements taken on different days along a year. The dates are not evenly distributed and I am interested on the variation along time.
I have a pandas DataFrame with indexes and series of numbers, more or less like this: (notice the indexes):
np.random.seed(12345)
data = np.array( [ np.random.normal( i, 1, 10 ) for i in range(3) ] )
ii = np.array([ 3, 5, 8 ] )
df = pd.DataFrame( data=data, index=ii )
For each index, I need to make a boxplot, which is no problem:
plt.boxplot( [ df.loc[i] for i in df.index ], vert=True, positions=ii )
The problem is, I need to place the boxes in a time axis, i.e. place the boxes on a concrete date
np.random.seed(12345)
data = np.array( [ np.random.normal( i, 1, 10 ) for i in range(3) ] )
dates = pd.to_datetime( [ '2015-06-01', '2015-06-15', '2015-08-30' ] )
df = pd.DataFrame( data=data, index=dates )
plt.boxplot( [ df.loc[i] for i in df.index ], vert=True )
However, if I incorporate the positions:
ax.boxplot( [ df.loc[i] for i in df.index ], vert=True, positions=dates )
I get an error:
TypeError: Cannot compare type 'Timedelta' with type 'float'
A look up on the docs shows:
plt.boxplot?
positions : array-like, default = [1, 2, ..., n]
Sets the positions of the boxes. The ticks and limits are automatically set to match the positions.
This code is intended to clarify, narrow down the problem. The boxes should apppear there, where the blue points are placed in the next figure.
np.random.seed(12345)
data = np.array( [ np.random.normal( i, 1, 10 ) for i in range(3) ] )
dates = pd.to_datetime( [ '2015-06-01', '2015-06-15', '2015-08-30' ] )
df = pd.DataFrame( data=data, index=dates )
fig, ax = plt.subplots( figsize=(10,5) )
x1 = pd.to_datetime( '2015-05-01' )
x2 = pd.to_datetime( '2015-09-30' )
ax.set_xlim( [ x1, x2 ] )
# ax.boxplot( [ df.loc[i] for i in df.index ], vert=True ) # Does not throw error, but plots nothing (out of range)
# ax.boxplot( [ df.loc[i] for i in df.index ], vert=True, positions=dates ) # This is what I'd like (throws TypeError)
ax.plot( dates, [ df.loc[i].mean() for i in df.index ], 'o' ) # Added to clarify the positions I aim for
Is there a method to place boxplots in a time axis?
I am using:
python: 3.4.3 + numpy: 1.11.0 + pandas: 0.18.0 + matplotlib: 1.5.1
So far, my best solution is to convert the units of the axis into a suitable int
unit and plot everything accordingly. In my case, those are days.
np.random.seed(12345)
data = np.array( [ np.random.normal( i, 1, 10 ) for i in range(3) ] )
dates = pd.to_datetime( [ '2015-06-01', '2015-06-15', '2015-08-30' ] )
df = pd.DataFrame( data=data, index=dates )
fig, ax = plt.subplots( figsize=(10,5) )
x1 = pd.to_datetime( '2015-05-01' )
x2 = pd.to_datetime( '2015-09-30' )
pos = ( dates - x1 ).days
ax.boxplot( [ df.loc[i] for i in df.index ], vert=True, positions=pos )
ax.plot( pos, [ df.loc[i].mean() for i in df.index ], 'o' )
ax.set_xlim( [ 0, (x2-x1).days ] )
ax.set_xticklabels( dates.date, rotation=45 )
The boxplots are placed on their correct position, but the code seems a bit cumbersome to me.
More importantly: The units of the x-axis are not "time" anymore.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With