Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing to execute scripts instead of function

1) Does the multiprocessing module support Python script files I can use to start a second process instead of a function?

Currently I use multiprocessing.Process which takes a function but I would like to execute foo.py instead. I could use subprocess.Popen but the benefit of multiprocessing.Process is that I can pass objects (even if they are just pickled).

When I use multiprocessing.Process, why is my_module imported in the child process but print("foo") is not executed?

2) When I use multiprocessing.Process, why is my_module imported in the child process but print("foo") is not executed? How is my_module available although the main scope is not executed?

import multiprocessing
import my_module
print("foo")

def worker():
    print("bar")
    my_module.foo()
    return

p = multiprocessing.Process(target=worker, args=(1,2, d))
p.start()
p.join()
like image 659
Daniel Stephens Avatar asked Dec 07 '18 18:12

Daniel Stephens


2 Answers

There is no obvious difference between a Python function and a routine you want to run in another process. Functions are just procedures.

Say if another script file (foo.py in this context) you wished to run in another process has following:

# for demonstration only
from stuff import do_things

a = 'foo'
b = 1
do_things(a, b) # it doesn't matter what this does

You could refactor foo.py this way

from stuff import do_things

def foo():
    a = 'foo'
    b = 1
    do_things(a, b)

And in the module you are spawning the process:

from foo import foo

p = multiprocess.Process(target=foo)
# ...

Process API requires that a "callable" is provided as a target. If say you tried to provided the module foo (where foo.py is the first version without a function foo):

import foo
p = Process(target=foo)
p.start()

You will get a TypeError: 'module' object is not callable error for a good reason. Imagine when you import foo module it eagerly executes right away since it's not wrapped inside a function/procedure aka callable. Try inserting a print statement in a module file and import it. Module-level statements are evaluated right away.

This answers question number 2:

When you imported my_module at the top level, it's imported once per module, even if worker was not executed. my_module was available to worker because worker procedure closes over my_module. When you pass a subroutine like worker to a concurrent process, there is no guarantee when it will be called or even will ever be.

You could import a module any where in a Python module, including within a function/subroutine. But doing so in this case might not be optimal or necessary.

like image 96
Pandemonium Avatar answered Nov 14 '22 19:11

Pandemonium


You can use multiprocessing.pool() and the pass the function inside the method which you want to execute. I have personally used it as you can split the data into multiple parts and also have the flexibility to use the number of cpu.

like image 41
Vikika Avatar answered Nov 14 '22 17:11

Vikika