Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed up Matplotlib?

I've read here that matplotlib is good at handling large data sets. I'm writing a data processing application and have embedded matplotlib plots into wx and have found matplotlib to be TERRIBLE at handling large amounts of data, both in terms of speed and in terms of memory. Does anyone know a way to speed up (reduce memory footprint of) matplotlib other than downsampling your inputs?

To illustrate how bad matplotlib is with memory consider this code:

import pylab
import numpy
a = numpy.arange(int(1e7)) # only 10,000,000 32-bit integers (~40 Mb in memory)
# watch your system memory now...
pylab.plot(a) # this uses over 230 ADDITIONAL Mb of memory
like image 734
David Morton Avatar asked Feb 12 '11 04:02

David Morton


People also ask

What is Blit in Matplotlib animation?

The blit keyword is an important one: this tells the animation to only re-draw the pieces of the plot which have changed. The time saved with blit=True means that the animations display much more quickly. We end with an optional save command, and then a show command to show the result.

What is BBOX in Matplotlib?

BboxTransformTo is a transformation that linearly transforms points from the unit bounding box to a given Bbox. In your case, the transform itself is based upon a TransformedBBox which again has a Bbox upon which it is based and a transform - for this nested instance an Affine2D transform.


2 Answers

Downsampling is a good solution here -- plotting 10M points consumes a bunch of memory and time in matplotlib. If you know how much memory is acceptable, then you can downsample based on that amount. For example, let's say 1M points takes 23 additional MB of memory and you find it to be acceptable in terms of space and time, therefore you should downsample so that it's always below the 1M points:

if(len(a) > 1M):
   a = scipy.signal.decimate(a, int(len(a)/1M)+1)
pylab.plot(a)

Or something like the above snippet (the above may downsample too aggressively for your taste.)

like image 56
brandx Avatar answered Nov 15 '22 13:11

brandx


I'm often interested in the extreme values too so, before plotting large chunks of data, I proceed in this way:

import numpy as np

s = np.random.normal(size=(1e7,))
decimation_factor = 10 
s = np.max(s.reshape(-1,decimation_factor),axis=1)

# To check the final size
s.shape

Of course np.max is just an example of extreme calculation function.

P.S. With numpy "strides tricks" it should be possible to avoid copying data around during reshape.

like image 22
Eraldo P. Avatar answered Nov 15 '22 14:11

Eraldo P.