matplotlib: faster PDF generation?

Question

I would like to use matplotlib to generate a number of PDF files. My main problem is that matplotlib is slow, taking order of 0.5 seconds per file.

I tried to figure out why it takes so long, and I wrote the following test program that just plots a very simple curve as a PDF file:

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

X = range(10)
Y = [ x**2 for x in X ]

for n in range(100):
    fig = plt.figure(figsize=(6,6))
    ax = fig.add_subplot(111)
    ax.plot(X, Y)
    fig.savefig("test.pdf")

But even something as simple as this takes a lot of time: 15–20 second in total for 100 PDF files (modern Intel platforms, I tried both Mac OS X and Linux systems).

Are there any tricks and techniques that I can use to speed up PDF generation in matplotlib? Obviously I can use multiple parallel threads on multi-core platforms, but is there anything else that I can do?

pelson · Accepted Answer

If its practical, you could use multiprocess to do this (assuming you have multiple cores on your machine):

NOTE: The following code will produce 40 pdfs in the present directory on your machine

import matplotlib.pyplot as plt

import multiprocessing


def do_plot(y_pos):
    fig = plt.figure()
    ax = plt.axes()
    ax.axhline(y_pos)
    fig.savefig('%s.pdf' % y_pos)

pool = multiprocessing.Pool()

for i in xrange(40):
    pool.apply_async(do_plot, [i])

pool.close()
pool.join()

It doesn't scale perfectly, but I get a significant boost by doing this on my 4 cores (dual-core with hypertheading):

$> time python multi_pool_1.py 
done

real    0m5.218s
user    0m4.901s
sys 0m0.205s

$> time python multi_pool_n.py 
done

real    0m2.935s
user    0m9.022s
sys 0m0.420s

I'm sure there is a lot of scope for performance improvements on the pdf backend of mpl, but that is not on the timescale you are after.

HTH,

seberg · Answer

Matplotlib has a lot of overhead for creation of the figure, etc. even before saving it to pdf. So if your plots are similar you can safe a lot of "setting up" by reusing elements, just like you will find in animation examples for matplotlib.

You can reuse the figure and axes in this example:

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

X = range(10)
Y = [ x**2 for x in X ]
fig = plt.figure(figsize=(6,6))
ax = fig.add_subplot(111)


for n in range(100):
    ax.clear() # or even better just line.remove()
               # but should interfere with autoscaling see also below about that
    line = ax.plot(X, Y)[0]
    fig.savefig("test.pdf")

Note that this does not help that much. You can save quite a bit more, by reusing the lines:

line = ax.plot(X, Y)[0]
for n in range(100):
    # Now instead of plotting, we update the current line:
    line.set_xdata(X)
    line.set_ydata(Y)
    # If autoscaling is necessary:
    ax.relim()
    ax.autoscale()

    fig.savefig("test.pdf")

This is close to twice as fast as the initial example for me. This is only an option if you do similar plots, but if they are very similar, it can speed up things a lot. The matplotlib animation examples may have inspiration for this kind of optimization.

matplotlib: faster PDF generation?

Tags:

python

matplotlib

pdf-generation

Jukka Suomela

2 Answers

pelson

seberg

Recent Activity

Donate For Us

matplotlib: faster PDF generation?

Tags:

python

matplotlib

pdf-generation

Jukka Suomela

2 Answers

pelson

seberg

Related questions

Recent Activity

Donate For Us