I've read here that matplotlib is good at handling large data sets. I'm writing a data processing application and have embedded matplotlib plots into wx and have found matplotlib to be TERRIBLE at handling large amounts of data, both in terms of speed and in terms of memory. Does anyone know a way to speed up (reduce memory footprint of) matplotlib other than downsampling your inputs?
To illustrate how bad matplotlib is with memory consider this code:
import pylab
import numpy
a = numpy.arange(int(1e7)) # only 10,000,000 32-bit integers (~40 Mb in memory)
# watch your system memory now...
pylab.plot(a) # this uses over 230 ADDITIONAL Mb of memory
The blit keyword is an important one: this tells the animation to only re-draw the pieces of the plot which have changed. The time saved with blit=True means that the animations display much more quickly. We end with an optional save command, and then a show command to show the result.
BboxTransformTo is a transformation that linearly transforms points from the unit bounding box to a given Bbox. In your case, the transform itself is based upon a TransformedBBox which again has a Bbox upon which it is based and a transform - for this nested instance an Affine2D transform.
Downsampling is a good solution here -- plotting 10M points consumes a bunch of memory and time in matplotlib. If you know how much memory is acceptable, then you can downsample based on that amount. For example, let's say 1M points takes 23 additional MB of memory and you find it to be acceptable in terms of space and time, therefore you should downsample so that it's always below the 1M points:
if(len(a) > 1M):
a = scipy.signal.decimate(a, int(len(a)/1M)+1)
pylab.plot(a)
Or something like the above snippet (the above may downsample too aggressively for your taste.)
I'm often interested in the extreme values too so, before plotting large chunks of data, I proceed in this way:
import numpy as np
s = np.random.normal(size=(1e7,))
decimation_factor = 10
s = np.max(s.reshape(-1,decimation_factor),axis=1)
# To check the final size
s.shape
Of course np.max
is just an example of extreme calculation function.
P.S.
With numpy
"strides tricks" it should be possible to avoid copying data around during reshape.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With