Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi processing code repeatedly runs

So I wish to create a process using the python multiprocessing module, I want it be part of a larger script. (I also want a lot of other things from it but right now I will settle for this)

I copied the most basic code from the multiprocessing docs and modified it slightly

However, everything outside of the if __name__ == '__main__': statement gets repeated every time p.join() is called.

This is my code:

from multiprocessing import Process

data = 'The Data'
print(data)

# worker function definition
def f(p_num):
    print('Doing Process: {}'.format(p_num))

print('start of name == main ')

if __name__ == '__main__':
    print('Creating process')
    p = Process(target=f, args=(data,))
    print('Process made')
    p.start()
    print('process started')
    p.join()
    print('process joined')

print('script finished')

This is what I expected:

The Data
start of name == main 
Creating process
Process made
process started
Doing Process: The Data
process joined
script finished

Process finished with exit code 0

This is the reality:

The Data
start of name == main 
Creating process
Process made
process started
The Data                         <- wrongly repeated line
start of name == main            <- wrongly repeated line
script finished                  <- wrongly executed early line
Doing Process: The Data
process joined
script finished

Process finished with exit code 0

I am not sure whether this is caused by the if statement or p.join() or something else and by extension why this is happening. Can some one please explain what caused this and why?

For clarity because some people cannot replicate my problem but I have; I am using Windows Server 2012 R2 Datacenter and I am using python 3.5.3.

like image 811
Harry de winton Avatar asked Aug 09 '17 13:08

Harry de winton


1 Answers

The way Multiprocessing works in Python is such that each child process imports the parent script. In Python, when you import a script, everything not defined within a function is executed. As I understand it, __name__ is changed on an import of the script (Check this SO answer here for a better understanding), which is different than if you ran the script on the command line directly, which would result in __name__ == '__main__'. This import results in __name__ not equalling '__main__', which is why the code in if __name__ == '__main__': is not executed for your subprocess.

Anything you don't want executed during subprocess calls should be moved into your if __name__ == '__main__': section of your code, as this will only run for the parent process, i.e. the script you run initially.

Hope this helps a bit. There are some more resources around Google that better explain this if you look around. I linked the official Python resource for the multiprocessing module, and I recommend you look through it.

like image 169
Peter Avatar answered Oct 06 '22 23:10

Peter