multiprocessing

Question

I'm currently using the standard multiprocessing in python to generate a bunch of processes that will run indefinitely. I'm not particularly concerned with performance; each thread is simply watching for a different change on the filesystem, and will take the appropriate action when a file is modified.

Currently, I have a solution that works, for my needs, in Linux. I have a dictionary of functions and arguments that looks like:

 job_dict['func1'] = {'target': func1, 'args': (args,)}

For each, I create a process:

 import multiprocessing
 for k in job_dict.keys():
     jobs[k] = multiprocessing.Process(target=job_dict[k]['target'],
                                       args=job_dict[k]['args'])

With this, I can keep track of each one that is running, and, if necessary, restart a job that crashes for any reason.

This does not work in Windows. Many of the functions I'm using are wrappers, using various functools functions, and I get messages about not being able to serialize the functions (see What can multiprocessing and dill do together?). I have not figured out why I do not get this error in Linux, but do in Windows.

If I import dill before starting my processes in Windows, I do not get the serialization error. However, the processes do not actually do anything. I cannot figure out why.

I then switched to the multiprocessing implementation in pathos, but did not find an analog to the simple Process class within the standard multiprocessing module. I was able to generate threads for each job using pathos.pools.ThreadPool. This is not the intended use for map, I'm sure, but it started all the threads, and they ran in Windows:

import pathos
tp = pathos.pools.ThreadPool()
for k in job_dict.keys():
    tp.uimap(job_dict[k]['target'], job_dict[k]['args'])

However, now I'm not sure how to monitor whether a thread is still active, which I'm looking for so that I can restart threads that crash for some reason or another. Any suggestions?

Mike McKerns · Accepted Answer

I'm the pathos and dill author. The Process class is buried deep within pathos at pathos.helpers.mp.process.Process, where mp itself is the actual fork of the multiprocessing library. Everything in multiprocessing should be accessible from there.

Another thing to know about pathos is that it keeps the pool alive for you until you remove it from the held state. This helps reduce overhead in creating "new" pools. To remove a pool, you do:

>>> # create
>>> p = pathos.pools.ProcessPool()
>>> # remove
>>> p.clear()

There's no such mechanism for a Process however.

For multiprocessing, windows is different than Linux and Macintosh… because windows doesn't have a proper fork like on linux… linux can share objects across processes, while on windows there is no sharing… it's basically a fully independent new process created… and therefore the serialization has to be better for the object to pass across to the other process -- just as if you would send the object to another computer. On, linux, you'd have to do this to get the same behavior:

def check(obj, *args, **kwds):
    """check pickling of an object across another process"""
    import subprocess
    fail = True
    try:
        _x = dill.dumps(x, *args, **kwds)
        fail = False
    finally:
        if fail:
            print "DUMP FAILED"
    msg = "python -c import dill; print dill.loads(%s)" % repr(_x)
    print "SUCCESS" if not subprocess.call(msg.split(None,2)) else "LOAD FAILED"

multiprocessing -> pathos.multiprocessing and windows

Tags:

python

pickle

dill

pathos

jdmcbr

1 Answers

Mike McKerns

Recent Activity

Donate For Us