Speed up Matplotlib?

Tags:

I've read here that matplotlib is good at handling large data sets. I'm writing a data processing application and have embedded matplotlib plots into wx and have found matplotlib to be TERRIBLE at handling large amounts of data, both in terms of speed and in terms of memory. Does anyone know a way to speed up (reduce memory footprint of) matplotlib other than downsampling your inputs?

To illustrate how bad matplotlib is with memory consider this code:

import pylab
import numpy
a = numpy.arange(int(1e7)) # only 10,000,000 32-bit integers (~40 Mb in memory)
# watch your system memory now...
pylab.plot(a) # this uses over 230 ADDITIONAL Mb of memory

734

asked Feb 12 '11 04:02

David Morton

2 Answers

Downsampling is a good solution here -- plotting 10M points consumes a bunch of memory and time in matplotlib. If you know how much memory is acceptable, then you can downsample based on that amount. For example, let's say 1M points takes 23 additional MB of memory and you find it to be acceptable in terms of space and time, therefore you should downsample so that it's always below the 1M points:

if(len(a) > 1M):
   a = scipy.signal.decimate(a, int(len(a)/1M)+1)
pylab.plot(a)

Or something like the above snippet (the above may downsample too aggressively for your taste.)

answered Nov 15 '22 13:11

brandx

I'm often interested in the extreme values too so, before plotting large chunks of data, I proceed in this way:

import numpy as np

s = np.random.normal(size=(1e7,))
decimation_factor = 10 
s = np.max(s.reshape(-1,decimation_factor),axis=1)

# To check the final size
s.shape

Of course np.max is just an example of extreme calculation function.

P.S. With numpy "strides tricks" it should be possible to avoid copying data around during reshape.

answered Nov 15 '22 14:11

Eraldo P.

Related questions
                            
                                SerializerClass field on Serializer save from primary key
                            
                                Python: Feed and parse stream of data to and from external program with additional input and output files
                            
                                Diffie-Hellman (to RC4) with Wincrypt From Python
                            
                                Access Azure blob storage from within an Azure ML experiment
                            
                                Equivalent of source() of R in Python
                            
                                Arranging letters in the most pronounceable way?
                            
                                Debugging in Python: Show last N executed lines
                            
                                Python regex module vs re module - pattern mismatch
                            
                                Django CORS Access-Control-Allow-Origin missing
                            
                                Dependencies between files with pytest-dependency?
                            
                                Spark is only using one worker machine when more are available
                            
                                cx_Freeze: “No module named 'codecs'” Windows 10
                            
                                How to efficiently pass function through?
                            
                                Fastest way to create a pandas column conditionally
                            
                                How to create asyncio stream reader/writer for stdin/stdout?
                            
                                Python Redis Queue (rq) - how to avoid preloading ML model for each job?
                            
                                Why can't eval find a variable defined in an outer function?
                            
                                Keras LSTM Autoencoder time-series reconstruction
                            
                                Running docker-compose from python [duplicate]
                            
                                If I cache a Spark Dataframe and then overwrite the reference, will the original data frame still be cached?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Speed up Matplotlib?

Tags:

python

matplotlib

plot

data-analysis

David Morton

People also ask

2 Answers

brandx

Eraldo P.

Recent Activity

Donate For Us