Multiprocessing in Python while limiting the number of running processes

Tags:

I'd like to run multiple instances of program.py simultaneously, while limiting the number of instances running at the same time (e.g. to the number of CPU cores available on my system). For example, if I have 10 cores and have to do 1000 runs of program.py in total, only 10 instances will be created and running at any given time.

I've tried using the multiprocessing module, multithreading, and using queues, but there's nothing that seemed to me to lend itself to an easy implementation. The biggest problem I have is finding a way to limit the number of processes running simultaneously. This is important because if I create 1000 processes at once, it becomes equivalent to a fork bomb. I don't need the results returned from the processes programmatically (they output to disk), and the processes all run independently of each other.

Can anyone please give me suggestions or an example of how I could implement this in python, or even bash? I'd post the code I've written so far using queues, but it doesn't work as intended and might already be down the wrong path.

Many thanks.

209

asked Aug 16 '12 22:08

steadfast

Video Answer

2 Answers

I know you mentioned that the Pool.map approach doesn't make much sense to you. The map is just an easy way to give it a source of work, and a callable to apply to each of the items. The func for the map could be any entry point to do the actual work on the given arg.

If that doesn't seem right for you, I have a pretty detailed answer over here about using a Producer-Consumer pattern: https://stackoverflow.com/a/11196615/496445

Essentially, you create a Queue, and start N number of workers. Then you either feed the queue from the main thread, or create a Producer process that feeds the queue. The workers just keep taking work from the queue and there will never be more concurrent work happening than the number of processes you have started.

You also have the option of putting a limit on the queue, so that it blocks the producer when there is already too much outstanding work, if you need to put constraints also on the speed and resources that the producer consumes.

The work function that gets called can do anything you want. This can be a wrapper around some system command, or it can import your python lib and run the main routine. There are specific process management systems out there which let you set up configs to run your arbitrary executables under limited resources, but this is just a basic python approach to doing it.

Snippets from that other answer of mine:

Basic Pool:

from multiprocessing import Pool

def do_work(val):
    # could instantiate some other library class,
    # call out to the file system,
    # or do something simple right here.
    return "FOO: %s" % val

pool = Pool(4)
work = get_work_args()
results = pool.map(do_work, work)

Using a process manager and producer

from multiprocessing import Process, Manager
import time
import itertools

def do_work(in_queue, out_list):
    while True:
        item = in_queue.get()

        # exit signal 
        if item == None:
            return

        # fake work
        time.sleep(.5)
        result = item

        out_list.append(result)


if __name__ == "__main__":
    num_workers = 4

    manager = Manager()
    results = manager.list()
    work = manager.Queue(num_workers)

    # start for workers    
    pool = []
    for i in xrange(num_workers):
        p = Process(target=do_work, args=(work, results))
        p.start()
        pool.append(p)

    # produce data
    # this could also be started in a producer process
    # instead of blocking
    iters = itertools.chain(get_work_args(), (None,)*num_workers)
    for item in iters:
        work.put(item)

    for p in pool:
        p.join()

    print results

answered Sep 18 '22 15:09

jdi

You should use a process supervisor. One approach would be using the API provided by Circus to do that "programatically", the documentation site is now offline but I think its just a temporary problem, anyway, you can use the Circus to handle this. Another approach would be using the supervisord and setting the parameter numprocs of the process to the number of cores you have.

An example using Circus:

from circus import get_arbiter

arbiter = get_arbiter("myprogram", numprocesses=3)
try:
    arbiter.start()
finally:
    arbiter.stop()

answered Sep 20 '22 15:09

Tarantula

Related questions
                            
                                Python: Convert dataframe into a list with string items inside list
                            
                                Jupyter notebook xgboost import
                            
                                Converting pandas column of comma-separated strings into dummy variables
                            
                                How to install a specific git branch with pipenv
                            
                                Python/python3 executes in Command Prompt, but does not run correctly
                            
                                Are there any "nice to program" GUI toolkits for Python? [closed]
                            
                                Python module dependency
                            
                                How do you dynamically hide form fields in Django?
                            
                                Python generators in various languages [closed]
                            
                                Python - Twisted and Unit Tests
                            
                                Using a Unicode format for Python's `time.strftime()`
                            
                                Can django lazy-load fields in a model?
                            
                                Why is it not safe to modify sequence being iterated on?
                            
                                Python, import string of Python code as module
                            
                                Python: Open a Listening Port Behind a Router (upnp?)
                            
                                How do I write data to csv file in columns and rows from a list in python?
                            
                                Python Child cannot use a Module the Parent Imported
                            
                                NumPy k-th diagonal indices
                            
                                Replace a string located between
                            
                                "object of type 'NoneType' has no len()" error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Multiprocessing in Python while limiting the number of running processes

Tags:

python

multithreading

multiprocessing

steadfast

People also ask

Video Answer

2 Answers

jdi

Tarantula

Recent Activity

Donate For Us