Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing

This question is more fact finding and thought process than code oriented.

I have many compiled C++ programs that I need to run at different times and with different parameters. I'm looking at using Python multiprocessing to read a job from job queue (rabbitmq) and then feed that job to a C++ program to run (maybe subprocess). I was looking at the multiprocessing module because this will all run on dual Xeon server so I want to take full advantage of the multiprocessor ability of my server.

The Python program would be the central manager and would simply read jobs from the queue, spawn a process (or subprocess?) with the appropriate C++ program to run the job, get the results (subprocess stdout & stderr), feed that to a callback and put the process back in a queue of processes waiting for the next job to run.

First, does this sound like a valid strategy?

Second, are there any type of examples of something similar to this?

Thank you in advance.

like image 311
FFredrixson Avatar asked Jun 22 '11 13:06

FFredrixson


2 Answers

The Python program would be the central manager and would simply read jobs from the que, spawn a process (or subprocess?) with the appropriate C++ program to run the job, get the results (subprocess stdout & stderr), feed that to a callback and put the process back in a que of processes waiting for the next job to run.

You don't need the multiprocessing module for this. The multiprocessing module is good for running Python functions as separate processes. To run a C++ program and read results from stdout, you'd only need the subprocess module. The queue could be a list, and your Python program would simply loop while the list is non-empty.


However, if you want to

  1. spawn multiple worker processes
  2. have them read from a common queue
  3. use the arguments from the queue to spawn C++ programs (in parallel)
  4. use the output of the C++ programs to put new items in the queue

then you could do it with multiprocessing like this:

test.py:

import multiprocessing as mp
import subprocess
import shlex

def worker(q):
    while True:
        # Get an argument from the queue
        x=q.get()

        # You might change this to run your C++ program
        proc=subprocess.Popen(
            shlex.split('test2.py {x}'.format(x=x)),stdout=subprocess.PIPE)
        out,err=proc.communicate()

        print('{name}: using argument {x} outputs {o}'.format(
            x=x,name=mp.current_process().name,o=out))

        q.task_done()

        # Put a new argument into the queue
        q.put(int(out))

def main():
    q=mp.JoinableQueue()

    # Put some initial values into the queue
    for t in range(1,3):
        q.put(t)

    # Create and start a pool of worker processes
    for i in range(3):
        p=mp.Process(target=worker, args=(q,))
        p.daemon=True
        p.start()
    q.join()
    print "Finished!"

if __name__=='__main__':
    main()

test2.py (a simple substitute for your C++ program):

import time
import sys

x=int(sys.argv[1])
time.sleep(0.5)
print(x+3)

Running test.py might yield something like this:

Process-1: using argument 1 outputs 4
Process-3: using argument 3 outputs 6
Process-2: using argument 2 outputs 5
Process-3: using argument 6 outputs 9
Process-1: using argument 4 outputs 7
Process-2: using argument 5 outputs 8
Process-3: using argument 9 outputs 12
Process-1: using argument 7 outputs 10
Process-2: using argument 8 outputs 11
Process-1: using argument 10 outputs 13

Notice that the numbers in the right-hand column are fed back into the queue, and are (eventually) used as arguments to test2.py and show up as numbers in the left-hard column.

like image 105
unutbu Avatar answered Oct 29 '22 23:10

unutbu


First, does this sound like a valid strategy?

Yes.

Second, are there any type of examples of something similar to this?

Celery

like image 31
S.Lott Avatar answered Oct 29 '22 22:10

S.Lott