I have troubles with multiprocessing in Matplotlib since version 1.5. The fonts are randomly jumping around their original position. Example is here:
The simple example to reproduce this bug is here:
import multiprocessing
import matplotlib.pyplot as plt
fig = plt.figure()
def plot(i):
fig = plt.gcf()
plt.plot([],[])
fig.savefig('%d.png' % i)
plot(0)
pool = multiprocessing.Pool(4)
pool.map(plot, range(10))
if the order of multiprocessing and simple plotting is reversed
pool = multiprocessing.Pool(4)
plot(0)
pool.map(plot, range(10))
then it works, but this workaround is useless for my purpose.
Thank you.
I've recently run into this same problem while testing methods for parallel plotting large numbers of plots. While I haven't found a solution using the multiprocessing module, I've found that I do not see the same errors using the Parallel Python package (http://www.parallelpython.com/). It seems to be ~50% slower than the multiprocessing module in my early tests, but still a significant speedup over serial plotting. It's also a little finicky regarding module imports so I would ultimately prefer to find a solution using multiprocessing, but for now this is a passable workaround (for me at least). That said, I'm pretty new to parallel processing so there may be some nuances of the two approaches that I'm missing here.
###############################################################################
import os
import sys
import time
#import numpy as np
import numpy # Importing with 'as' doesn't work with Parallel Python
#import matplotlib.pyplot as plt
import matplotlib.pyplot # Importing with 'as' doesn't work with Parallel Python
import pp
import multiprocessing as mp
###############################################################################
path1='./Test_PP'
path2='./Test_MP'
nplots=100
###############################################################################
def plotrandom(plotid,N,path):
numpy.random.seed() # Required for multiprocessing module but not Parallel Python...
x=numpy.random.randn(N)
y=x**2
matplotlib.pyplot.scatter(x,y)
matplotlib.pyplot.savefig(os.path.join(path,'test_%d.png'%(plotid)),dpi=150)
matplotlib.pyplot.close('all')
############################################################################## #
# Parallel Python implementation
tstart_1=time.time()
if not os.path.exists(path1):
os.makedirs(path1)
ppservers = ()
if len(sys.argv) > 1:
ncpus = int(sys.argv[1])
job_server = pp.Server(ncpus, ppservers=ppservers)
else:
job_server = pp.Server(ppservers=ppservers)
print "Starting Parallel Python v2 with", job_server.get_ncpus(), "workers"
jobs = [(input_i, job_server.submit(plotrandom,(input_i,10,path1),(),("numpy","matplotlib.pyplot"))) for input_i in range(nplots)]
for input_i, job in jobs:
job()
tend_1=time.time()
t1=tend_1-tstart_1
print 'Parallel Python = %0.5f sec'%(t1)
job_server.print_stats()
############################################################################## #
# Multiprocessing implementation
tstart_2=time.time()
if not os.path.exists(path2):
os.makedirs(path2)
if len(sys.argv) > 1:
ncpus = int(sys.argv[1])
else:
ncpus = mp.cpu_count()
print "Starting multiprocessing v2 with %d workers"%(ncpus)
pool = mp.Pool(processes=ncpus)
jobs = [pool.apply_async(plotrandom, args=(i,10,path2)) for i in range(nplots)]
results = [r.get() for r in jobs] # This line actually runs the jobs
pool.close()
pool.join()
tend_2=time.time()
t2=tend_2-tstart_2
print 'Multiprocessing = %0.5f sec'%(t2)
###############################################################################
I have found a solution. The main cause of the troubles is the font caching in dictionary _fontd in /matplotlib/backends/backend_agg.py
Therefore, I have used a different hash for each process by adding multiprocessing.current_process().pid to hash called key in function _get_agg_font.
If anybody know more elegant solution which would not require modification of matplotlib files, just let me know.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With