Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matplotlib multiprocessing fonts corruption using savefig

I have troubles with multiprocessing in Matplotlib since version 1.5. The fonts are randomly jumping around their original position. Example is here: enter image description here

The simple example to reproduce this bug is here:

import multiprocessing
import matplotlib.pyplot as plt

fig = plt.figure()

def plot(i):
    fig = plt.gcf()
    plt.plot([],[])
    fig.savefig('%d.png' % i)

plot(0)
pool = multiprocessing.Pool(4)
pool.map(plot, range(10))

if the order of multiprocessing and simple plotting is reversed

pool = multiprocessing.Pool(4)
plot(0)
pool.map(plot, range(10))

then it works, but this workaround is useless for my purpose.

Thank you.

like image 752
Tomas Avatar asked Feb 13 '16 14:02

Tomas


Video Answer


2 Answers

I've recently run into this same problem while testing methods for parallel plotting large numbers of plots. While I haven't found a solution using the multiprocessing module, I've found that I do not see the same errors using the Parallel Python package (http://www.parallelpython.com/). It seems to be ~50% slower than the multiprocessing module in my early tests, but still a significant speedup over serial plotting. It's also a little finicky regarding module imports so I would ultimately prefer to find a solution using multiprocessing, but for now this is a passable workaround (for me at least). That said, I'm pretty new to parallel processing so there may be some nuances of the two approaches that I'm missing here.

###############################################################################
import os
import sys
import time
#import numpy as np
import numpy    # Importing with 'as' doesn't work with Parallel Python
#import matplotlib.pyplot as plt
import matplotlib.pyplot    # Importing with 'as' doesn't work with Parallel Python
import pp
import multiprocessing as mp
###############################################################################
path1='./Test_PP'
path2='./Test_MP'
nplots=100
###############################################################################
def plotrandom(plotid,N,path):
    numpy.random.seed() # Required for multiprocessing module but not Parallel Python...
    x=numpy.random.randn(N)
    y=x**2
    matplotlib.pyplot.scatter(x,y)
    matplotlib.pyplot.savefig(os.path.join(path,'test_%d.png'%(plotid)),dpi=150)
    matplotlib.pyplot.close('all')
##############################################################################    #
# Parallel Python implementation
tstart_1=time.time()
if not os.path.exists(path1):
    os.makedirs(path1)

ppservers = ()

if len(sys.argv) > 1:
    ncpus = int(sys.argv[1])
    job_server = pp.Server(ncpus, ppservers=ppservers)
else:
    job_server = pp.Server(ppservers=ppservers)

print "Starting Parallel Python v2 with", job_server.get_ncpus(), "workers"

jobs = [(input_i, job_server.submit(plotrandom,(input_i,10,path1),(),("numpy","matplotlib.pyplot"))) for input_i in range(nplots)]

for input_i, job in jobs:
    job()

tend_1=time.time()
t1=tend_1-tstart_1
print 'Parallel Python = %0.5f sec'%(t1)
job_server.print_stats()
##############################################################################    #
# Multiprocessing implementation
tstart_2=time.time()
if not os.path.exists(path2):
    os.makedirs(path2)

if len(sys.argv) > 1:
    ncpus = int(sys.argv[1])
else:
    ncpus = mp.cpu_count()

print "Starting multiprocessing v2 with %d workers"%(ncpus)

pool = mp.Pool(processes=ncpus)
jobs = [pool.apply_async(plotrandom, args=(i,10,path2)) for i in range(nplots)]
results = [r.get() for r in jobs]    # This line actually runs the jobs
pool.close()
pool.join()

tend_2=time.time()
t2=tend_2-tstart_2
print 'Multiprocessing = %0.5f sec'%(t2)
###############################################################################
like image 67
GSR Avatar answered Sep 30 '22 14:09

GSR


I have found a solution. The main cause of the troubles is the font caching in dictionary _fontd in /matplotlib/backends/backend_agg.py

Therefore, I have used a different hash for each process by adding multiprocessing.current_process().pid to hash called key in function _get_agg_font.

If anybody know more elegant solution which would not require modification of matplotlib files, just let me know.

like image 33
Tomas Avatar answered Sep 30 '22 14:09

Tomas