For the following dataframe (df),
ColA ColA_weights ColB ColB_weights
0 0.038671 1073 1.859599 1
1 20.39974 57362 10.59599 1
2 10.29974 5857 2.859599 1
3 5.040000 1288 33.39599 1
4 1.040000 1064 7.859599 1
I want to draw a weighted boxplot, where the weights for each box are given by ColA_weights and ColB_weights respectively, I simply do
df.boxplot(fontsize=12,notch=0,whis=1.5,vert=1,widths=0.2)
However, there seems to be no provision to include weights. Any solutions?
thanks!
As suggested in the comments, here is a way to make a list with each entry showed as many times as the weights indicates. I think this is not the most clever solution and someone could come up with a better one.
My example is only applied to the column A, but you should be able to use it the same way on column B :
import matplotlib.pyplot as plt
weighted_appearances = []
for index, row in df.iterrows():
weighted_row = [row.ColA]*row.ColA_weights
weighted_appearances += weighted_row
plt.boxplot(weighted_appearances)
plt.show()
Pros : very simple solution to write, theoretically work for all cases (if your weights are not integers you would have to convert/round them in a way that you find acceptable though)
Cons : not very efficient, if you are working with really large weights you would have to find a way to "scale down" those to have a reasonable memory usage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With