Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix the python multiprocessing matplotlib savefig() issue?

I want to speed up matplotlib.savefig() for many figures by multiprocessing module, and trying to benchmark the performance between parallel and sequence.

Below is the codes:

# -*- coding: utf-8 -*-
"""
Compare the time of matplotlib savefig() in parallel and sequence
"""

import numpy as np
import matplotlib.pyplot as plt
import multiprocessing
import time


def gen_fig_list(n):
    ''' generate a list to contain n demo scatter figure object '''
    plt.ioff()
    fig_list = []
    for i in range(n):
        plt.figure();
        dt = np.random.randn(5, 4);
        fig = plt.scatter(dt[:,0], dt[:,1], s=abs(dt[:,2]*1000), c=abs(dt[:,3]*100)).get_figure()
        fig.FM_figname = "img"+str(i)
        fig_list.append(fig)
    plt.ion()
    return fig_list


def savefig_worker(fig, img_type, folder):
    file_name = folder+"\\"+fig.FM_figname+"."+img_type
    fig.savefig(file_name, format=img_type, dpi=fig.dpi)
    return file_name


def parallel_savefig(fig_list, folder):
    proclist = []
    for fig in fig_list:
        print fig.FM_figname,
        p = multiprocessing.Process(target=savefig_worker, args=(fig, 'png', folder)) # cause error
        proclist.append(p)
        p.start()

    for i in proclist:
        i.join()



if __name__ == '__main__':
    folder_1, folder_2 = 'Z:\\A1', 'Z:\\A2'
    fig_list = gen_fig_list(10)

    t1 = time.time()
    parallel_savefig(fig_list,folder_1)
    t2 = time.time()
    print '\nMulprocessing time    : %0.3f'%((t2-t1))

    t3 = time.time()
    for fig in fig_list:
        savefig_worker(fig, 'png', folder_2)
    t4 = time.time()
    print 'Non_Mulprocessing time: %0.3f'%((t4-t3))

And I meet problem "This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information." error caused by p = multiprocessing.Process(target=savefig_worker, args=(fig, 'png', folder)) .

Why ? And how to solve it ?

(Windows XP + Python: 2.6.1 + Numpy: 1.6.2 + Matplotlib: 1.2.0)

EDIT: (add error msg on python 2.7.3)

When run on IDLE of python 2.7.3, it gives below error msg:

>>> 
img0

Traceback (most recent call last):
  File "C:\Documents and Settings\Administrator\desktop\mulsavefig_pilot.py", line 61, in <module>
    proc.start()
  File "d:\Python27\lib\multiprocessing\process.py", line 130, in start

  File "d:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "d:\Python27\lib\pickle.py", line 748, in save_global
    (obj, module, name))
PicklingError: Can't pickle <function notify_axes_change at 0x029F5030>: it's not found as matplotlib.backends.backend_qt4.notify_axes_change

EDIT: (My solution demo)

inspired by Matplotlib: simultaneous plotting in multiple threads

# -*- coding: utf-8 -*-
"""
Compare the time of matplotlib savefig() in parallel and sequence
"""

import numpy as np
import matplotlib.pyplot as plt
import multiprocessing
import time


def gen_data(fig_qty, bubble_qty):
    ''' generate data for fig drawing '''
    dt = np.random.randn(fig_qty, bubble_qty, 4)
    return dt


def parallel_savefig(draw_data, folder):
    ''' prepare data and pass to worker '''

    pool = multiprocessing.Pool()

    fig_qty = len(draw_data)
    fig_para = zip(range(fig_qty), draw_data, [folder]*fig_qty)

    pool.map(fig_draw_save_worker, fig_para)
    return None


def fig_draw_save_worker(args):
    seq, dt, folder = args
    plt.figure()
    fig = plt.scatter(dt[:,0], dt[:,1], s=abs(dt[:,2]*1000), c=abs(dt[:,3]*100), alpha=0.7).get_figure()
    plt.title('Plot of a scatter of %i' % seq)
    fig.savefig(folder+"\\"+'fig_%02i.png' % seq)
    plt.close()
    return None


if __name__ == '__main__':
    folder_1, folder_2 = 'A1', 'A2'
    fig_qty, bubble_qty =  500, 100
    draw_data = gen_data(fig_qty, bubble_qty)

    print 'Mulprocessing  ...   ',
    t1 = time.time()
    parallel_savefig(draw_data, folder_1)
    t2 = time.time()
    print 'Time : %0.3f'%((t2-t1))

    print 'Non_Mulprocessing .. ', 
    t3 = time.time()
    for para in zip(range(fig_qty), draw_data, [folder_2]*fig_qty):
        fig_draw_save_worker(para)
    t4 = time.time()
    print 'Time : %0.3f'%((t4-t3))

    print 'Speed Up: %0.1fx'%(((t4-t3)/(t2-t1)))
like image 201
bigbug Avatar asked Mar 20 '13 13:03

bigbug


People also ask

Does Matplotlib work with Streamlit?

Matplotlib is one of the most popular charting libraries in Python. It's also a popular way to add charts to your Streamlit apps.

What does %Matplotlib do in Python?

Matplotlib is a cross-platform, data visualization and graphical plotting library for Python and its numerical extension NumPy. As such, it offers a viable open source alternative to MATLAB. Developers can also use matplotlib's APIs (Application Programming Interfaces) to embed plots in GUI applications.

Is Pyplot thread safe?

pyplot is not thread safe.


2 Answers

You can try to move all of the matplotlib code(including the import) to a function.

  1. Make sure you don't have a import matplotlib or import matplotlib.pyplot as plt at the top of your code.

  2. create a function that does all the matplotlib including the import.

Example:

import numpy as np
from multiprocessing import pool

def graphing_function(graph_data):
    import matplotlib.pyplot as plt
    plt.figure()
    plt.hist(graph_data.data)
    plt.savefig(graph_data.filename)
    plt.close()
    return

pool = Pool(4)
pool.map(graphing_function, data_list) 
like image 128
Hidden Name Avatar answered Oct 04 '22 04:10

Hidden Name


It is not really a bug, per-say, more of a limitation.

The explanation is in the last line of your error mesage:

PicklingError: Can't pickle <function notify_axes_change at 0x029F5030>: it's not found as matplotlib.backends.backend_qt4.notify_axes_change

It is telling you that elements of the figure objects can not be pickled, which is how MultiProcess passes data between the processes. The objects are pickled in the main processes, shipped as pickles, and then re-constructed on the other side. Even if you fixed this exact issue (maybe by using a different backend, or stripping off the offending function (which might break things in other ways)) I am pretty sure there are core parts of Figure, Axes, or Canvas objects that can not be pickled.

As @bigbug point to, an example of how to get around this limitation, Matplotlib: simultaneous plotting in multiple threads. The basic idea is that you push your entire plotting routine off to the sub-process so you only push numpy arrays an maybe some configuration information across the process boundry.

like image 33
tacaswell Avatar answered Oct 04 '22 04:10

tacaswell