Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

matplotlib - producing boxplots in a loop

I'd like to plot several boxplots in one figure, on one axis. The data that I use for the boxplots, however, is too large to be read into memory at once. So I read it in chunks using pandas read_csv(). What I would like to do is to produce some boxplots in each iteration and add the new boxplots from iteration i to the same figure as the boxplots from iteration i-1, wihtout holding on to the data of iteration i-1.

I want to stress, that I do not need to update the data for an already existing boxplot. It's more like I get a new data column with each iteration and I want to display a boxplot of that column next to the existing boxplot.

E.g.: Say I have

df = pd.DataFrame(np.random.rand(100,2))

Assume that I could read the columns only one after the other. How do I add the boxplot of the second column to the already existing boxplot of the first column to have the same result as ax.boxplot(df.values)?

like image 343
user3820991 Avatar asked Sep 11 '14 11:09

user3820991


2 Answers

The boxplot method has a positions argument. Using that, you can guarantee, in a loop, that each boxplot (or multiple ones) is set in it's position.

Here's some code:

In [17]: x = pds.DataFrame(np.random.randn(10, 10))
In [18]: fig = plt.figure()
In [19]: ax = plt.subplot(111)
In [20]: for i in range(10):
    ...:     ax.boxplot(x.ix[:,i].values, positions = [i])
    ...:     
In [21]: ax.set_xlim(-0.5, 9.5)
In [22]: plt.show()
like image 140
Korem Avatar answered Nov 15 '22 00:11

Korem


Please note the following update, this:

ax.boxplot(x.ix[:,i].values, positions = [i])

Should be replaced by:

ax.boxplot(x.iloc[:,i].values, positions = [i])

as ix is deprecated.

like image 21
Aurelie Giraud Avatar answered Nov 14 '22 23:11

Aurelie Giraud