Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas boxplot x-axis setting

I want to create a boxplot of data collected from four different sites over the past twenty years (i.e. each site will have 20y of data). This will produce 80 boxes on the figure. To make the figure legible, I want each box offset, and have different color boxes for each site. This will yield a repeated series of boxes (e.g. boxes for site1,site2,site3,site3,site1,site2,site3,...). Creating a boxplot is not a problem; offsetting the boxes does seem to be an issue. e.g.

import numpy as np
import pandas as pd
from pylab import *

first  = pd.DataFrame(np.random.rand(10,5),columns=np.arange(0,5))
second = pd.DataFrame(np.random.rand(10,5),columns=np.arange(5,10))

fig = figure( figsize=(9,6.5) )
ax  = fig.add_subplot(111)

box1 = first.boxplot(ax=ax,notch=False,widths=0.20,sym='',rot=-45)
setp(box1['caps'],color='r',linewidth=2)
setp(box1['boxes'],color='r',linewidth=2)
setp(box1['medians'],color='r',linewidth=2)
setp(box1['whiskers'],color='r',linewidth=2,linestyle='-')

box2 = second.boxplot(ax=ax,notch=False,widths=0.20,sym='',rot=-45)
setp(box2['caps'],color='k',linewidth=2)
setp(box2['boxes'],color='k',linewidth=2)
setp(box2['medians'],color='k',linewidth=2)
setp(box2['whiskers'],color='k',linewidth=2,linestyle='-')

Initially I hoped Pandas would index the x-axis by column name, but Pandas seems to be indexing the x-axis according to column position, which is frustrating. Can anyone recommend a method of offsetting the boxes so they do not lay on top of one another?

like image 445
tnknepp Avatar asked Jan 13 '14 20:01

tnknepp


1 Answers

You need to specify the positions of the bars:

box1 = first.boxplot(ax=ax,notch=False,widths=0.20,sym='',rot=-45, positions=np.arange(0.0,4.0,1.0))
box2 = second.boxplot(ax=ax,notch=False,widths=0.20,sym='',rot=-45, positions=np.arange(0.3,4.3,1.0))

Or you could move the boxs to the side you pleased (this have the extra of keeping the label centered):

disp = 0.15
for k in box1.keys():
    for line1,line2 in zip(box1[k],box2[k]):
        setp(line1,xdata=getp(line1,'xdata') - disp)
        setp(line2,xdata=getp(line2,'xdata') + disp)
like image 193
Alvaro Fuentes Avatar answered Sep 24 '22 05:09

Alvaro Fuentes