Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matplotlib: avoiding overlapping datapoints in a "scatter/dot/beeswarm" plot

When drawing a dot plot using matplotlib, I would like to offset overlapping datapoints to keep them all visible. For example, if I have:

CategoryA: 0,0,3,0,5   CategoryB: 5,10,5,5,10   

I want each of the CategoryA "0" datapoints to be set side by side, rather than right on top of each other, while still remaining distinct from CategoryB.

In R (ggplot2) there is a "jitter" option that does this. Is there a similar option in matplotlib, or is there another approach that would lead to a similar result?

Edit: to clarify, the "beeswarm" plot in R is essentially what I have in mind, and pybeeswarm is an early but useful start at a matplotlib/Python version.

Edit: to add that Seaborn's Swarmplot, introduced in version 0.7, is an excellent implementation of what I wanted.

like image 424
iayork Avatar asked Dec 29 '11 18:12

iayork


People also ask

How do you avoid overlapping in scatter plots in python?

Dot Size. You can try to decrease marker size in your plot. This way they won't overlap and the patterns will be clearer.

Is PLT show () blocking?

Answer #6: plt. show() and plt. draw() are unnecessary and / or blocking in one way or the other.

What is the significance of Pylab scatter?

scatter() Scatter plots are used to observe relationship between variables and uses dots to represent the relationship between them. The scatter() method in the matplotlib library is used to draw a scatter plot.


1 Answers

Extending the answer by @user2467675, here’s how I did it:

def rand_jitter(arr):     stdev = .01 * (max(arr) - min(arr))     return arr + np.random.randn(len(arr)) * stdev  def jitter(x, y, s=20, c='b', marker='o', cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, verts=None, hold=None, **kwargs):     return scatter(rand_jitter(x), rand_jitter(y), s=s, c=c, marker=marker, cmap=cmap, norm=norm, vmin=vmin, vmax=vmax, alpha=alpha, linewidths=linewidths, **kwargs) 

The stdev variable makes sure that the jitter is enough to be seen on different scales, but it assumes that the limits of the axes are zero and the max value.

You can then call jitter instead of scatter.

like image 107
yoavram Avatar answered Oct 01 '22 13:10

yoavram