Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python What is the difference between a Pool of worker processes and just running multiple Processes?

I am not sure when to use pool of workers vs multiple processes.

processes = []

for m in range(1,5):
       p = Process(target=some_function)
       p.start()
       processes.append(p)

for p in processes:
       p.join()

vs

if __name__ == '__main__':
    # start 4 worker processes
    with Pool(processes=4) as pool:
        pool_outputs = pool.map(another_function, inputs)
like image 982
whiteSkar Avatar asked Oct 30 '15 22:10

whiteSkar


2 Answers

As it says on PYMOTW:

The Pool class can be used to manage a fixed number of workers for simple cases where the work to be done can be broken up and distributed between workers independently.

The return values from the jobs are collected and returned as a list.

The pool arguments include the number of processes and a function to run when starting the task process (invoked once per child).

Please have a look at the examples given there to better understand its application, functionalities and parameters.

Basically the Pool is a helper, easing the management of the processes (workers) in those cases where all they need to do is consume common input data, process it in parallel and produce a joint output.

The Pool does quite a few things that otherwise you should code yourself (not too hard, but still, it's convenient to find a pre-cooked solution)

i.e.

  • the splitting of the input data
  • the target process function is simplified: it can be designed to expect one input element only. The Pool is going to call it providing each element from the subset allocated to that worker
  • waiting for the workers to finish their job (i.e. joining the processes)
  • ...
  • merging the output of each worker to produce the final output
like image 111
Pynchia Avatar answered Oct 23 '22 03:10

Pynchia


Below information might help you understanding the difference between Pool and Process in Python multiprocessing class:

Pool:

  1. When you have junk of data, you can use Pool class.
  2. Only the process under executions are kept in the memory.
  3. I/O operation: It waits till the I/O operation is completed & does not schedule another process. This might increase the execution time.
  4. Uses FIFO scheduler.

Process:

  1. When you have a small data or functions and less repetitive tasks to do.
  2. It puts all the process in the memory. Hence in the larger task, it might cause to loss of memory.
  3. I/O operation: The process class suspends the process executing I/O operations and schedule another process parallel.
  4. Uses FIFO scheduler.
like image 36
ANK Avatar answered Oct 23 '22 04:10

ANK