Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does pool run the entire file multiple times?

I'm trying to understand the output from this Python 2.7.5 example script:

import time
from multiprocessing import Pool

print(time.strftime('%Y-%m-%d %H:%M', time.localtime(time.time())))
props2=[
            '170339',
            '170357',
            '170345',
            '170346',
            '171232',
            '170363',
            ]
def go(x):
     print(x)

if __name__ == '__main__':
    pool = Pool(processes=3)
    pool.map(go, props2)

print(time.strftime('%Y-%m-%d %H:%M', time.localtime(time.time())))  

This yields the output:

2015-08-06 10:13

2015-08-06 10:13

2015-08-06 10:13

170339

170357

170345

170346

171232

170363

2015-08-06 10:13

2015-08-06 10:13

2015-08-06 10:13

My questions are:

A) Why does the time print three times at the beginning and the end? I would have expected it to print the start time, and then the end time.

B) The real question - How do I get it to run one command multiple times, but all the others a single time?

like image 564
user2872147 Avatar asked Aug 06 '15 14:08

user2872147


People also ask

How do I stop multiprocessing in Python?

A process can be killed by calling the Process. kill() function. The call will only terminate the target process, not child processes. The method is called on the multiprocessing.

What is multiprocessing pool?

The Pool class in multiprocessing can handle an enormous number of processes. It allows you to run multiple jobs per process (due to its ability to queue the jobs). The memory is allocated only to the executing processes, unlike the Process class, which allocates memory to all the processes.

How does Python multiprocessing pool work?

It works like a map-reduce architecture. It maps the input to the different processors and collects the output from all the processors. After the execution of code, it returns the output in form of a list or array. It waits for all the tasks to finish and then returns the output.


2 Answers

Python imports the __main__ module for each process. On an import, the whole file is executed again. On python 3, if you remove the if __name__ == '__main__' you will get an infinite loop since the file is getting recursively called.

For the real question:

In python scripts, I typically try to avoid executing any statements or variables on the global scope except for function definitions. I use the below as a template for all python scripts.

import sys

def main(argv):
  #main logic here

if __name__ == '__main__':
  main(sys.argv)

When you have a script with re-usable functions, even if it has a main method, you can import it into another script if you need to.

like image 71
Leland Barton Avatar answered Sep 20 '22 07:09

Leland Barton


Multiprocessing needs to import your script in each subprocess in order to use the go() function. When your script is imported, it prints the date. If you only want something to run in the main script, put it in the if __name__ == '__main__' block.

like image 27
Kevin Avatar answered Sep 23 '22 07:09

Kevin