I'm trying to make an interactive program which primarily uses matplotlib to make scatter plots of rather a lot of points (10k-100k or so). Right now it works, but changes take too long to render. Small numbers of points are ok, but once the number rises things get frustrating in a hurry. So, I'm working on ways to speed up scatter, but I'm not having much luck
There's the obvious way to do thing (the way it's implemented now) (I realize the plot redraws without updating. I didn't want to alter the fps result with large calls to random).
import matplotlib.pyplot as plt
import numpy as np
import matplotlib as mpl
import time
X = np.random.randn(10000) #x pos
Y = np.random.randn(10000) #y pos
C = np.random.random(10000) #will be color
S = (1+np.random.randn(10000)**2)*3 #size
#build the colors from a color map
colors = mpl.cm.jet(C)
#there are easier ways to do static alpha, but this allows
#per point alpha later on.
colors[:,3] = 0.1
fig, ax = plt.subplots()
fig.show()
background = fig.canvas.copy_from_bbox(ax.bbox)
#this makes the base collection
coll = ax.scatter(X,Y,facecolor=colors, s=S, edgecolor='None',marker='D')
fig.canvas.draw()
sTime = time.time()
for i in range(10):
print i
#don't change anything, but redraw the plot
ax.cla()
coll = ax.scatter(X,Y,facecolor=colors, s=S, edgecolor='None',marker='D')
fig.canvas.draw()
print '%2.1f FPS'%( (time.time()-sTime)/10 )
Which gives a speedy 0.7 fps
Alternatively, I can edit the collection returned by scatter. For that, I can change color and position, but don't know how to change the size of each point. That would I think look something like this
import matplotlib.pyplot as plt
import numpy as np
import matplotlib as mpl
import time
X = np.random.randn(10000) #x pos
Y = np.random.randn(10000) #y pos
C = np.random.random(10000) #will be color
S = (1+np.random.randn(10000)**2)*3 #size
#build the colors from a color map
colors = mpl.cm.jet(C)
#there are easier ways to do static alpha, but this allows
#per point alpha later on.
colors[:,3] = 0.1
fig, ax = plt.subplots()
fig.show()
background = fig.canvas.copy_from_bbox(ax.bbox)
#this makes the base collection
coll = ax.scatter(X,Y,facecolor=colors, s=S, edgecolor='None', marker='D')
fig.canvas.draw()
sTime = time.time()
for i in range(10):
print i
#don't change anything, but redraw the plot
coll.set_facecolors(colors)
coll.set_offsets( np.array([X,Y]).T )
#for starters lets not change anything!
fig.canvas.restore_region(background)
ax.draw_artist(coll)
fig.canvas.blit(ax.bbox)
print '%2.1f FPS'%( (time.time()-sTime)/10 )
This results in a slower 0.7 fps. I wanted to try using CircleCollection or RegularPolygonCollection, as this would allow me to change the sizes easily, and I don't care about changing the marker. But, I can't get either to draw so I have no idea if they'd be faster. So, at this point I'm looking for ideas.
The last, Agg, is a non-interactive backend that can only write to files. It is used on Linux, if Matplotlib cannot connect to either an X display or a Wayland display.
The points in the scatter plot are by default small if the optional parameters in the syntax are not used. The optional parameter 's' is used to increase the size of scatter points in matplotlib.
The primary difference of plt. scatter from plt. plot is that it can be used to create scatter plots where the properties of each individual point (size, face color, edge color, etc.) can be individually controlled or mapped to data.
We are actively working on performance for large matplotlib scatter plots. I'd encourage you to get involved in the conversation (http://matplotlib.1069221.n5.nabble.com/mpl-1-2-1-Speedup-code-by-removing-startswith-calls-and-some-for-loops-td41767.html) and, even better, test out the pull request that has been submitted to make life much better for a similar case (https://github.com/matplotlib/matplotlib/pull/2156).
HTH
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With