Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted boxplot in Pandas

For the following dataframe (df),

     ColA      ColA_weights      ColB   ColB_weights
0  0.038671            1073  1.859599             1
1  20.39974           57362  10.59599             1
2  10.29974            5857  2.859599             1
3  5.040000            1288  33.39599             1
4  1.040000            1064  7.859599             1

I want to draw a weighted boxplot, where the weights for each box are given by ColA_weights and ColB_weights respectively, I simply do

df.boxplot(fontsize=12,notch=0,whis=1.5,vert=1,widths=0.2)

However, there seems to be no provision to include weights. Any solutions?

thanks!

like image 672
user308827 Avatar asked May 01 '14 16:05

user308827


1 Answers

As suggested in the comments, here is a way to make a list with each entry showed as many times as the weights indicates. I think this is not the most clever solution and someone could come up with a better one.

My example is only applied to the column A, but you should be able to use it the same way on column B :

import matplotlib.pyplot as plt

weighted_appearances = []
for index, row in df.iterrows():
    weighted_row = [row.ColA]*row.ColA_weights
    weighted_appearances += weighted_row

plt.boxplot(weighted_appearances)
plt.show()

Pros : very simple solution to write, theoretically work for all cases (if your weights are not integers you would have to convert/round them in a way that you find acceptable though)

Cons : not very efficient, if you are working with really large weights you would have to find a way to "scale down" those to have a reasonable memory usage.

like image 68
LoicM Avatar answered Nov 15 '22 14:11

LoicM