I am looking to use the multiprocessing module to speed up the run time of some Transport Planning models. I've optimized as much as I can via 'normal' methods but at the heart of it is an absurdly parallel problem. Eg Perform the same set of matrix operations four 4 different sets of inputs, all independent information.
Pseudo Code:
for mat1,mat2,mat3,mat4 in zip([a1,a2,a3,a4],[b1,b2,b3,b4],[c1,c2,c3,c4],[d1,d2,d3,d4]):
result1 = mat1*mat2^mat3
result2 = mat1/mat4
result3 = mat3.T*mat2.T+mat4
So all I really want to do is process the iterations of this loop in parallel on a quad core computer. I've read up here and other places on the multiprocessing module and it seems to fit the bill perfectly except for the required:
if __name__ == '__main__'
From what I understand this means that you can only multiprocess code run from a script? ie if I do something like:
import multiprocessing
from numpy.random import randn
a = randn(100,100)
b = randn(100,100)
c = randn(100,100)
d = randn(100,100)
def process_matrix(mat):
return mat^2
if __name__=='__main__':
print "Multiprocessing"
jobs=[]
for input_matrix in [a,b,c,d]:
p = multiprocessing.Process(target=process_matrix,args=(input_matrix,))
jobs.append(p)
p.start()
It runs fine, however assuming I saved the above as 'matrix_multiproc.py', and defined a new file 'importing_test.py' which just states:
import matrix_multiproc
The multiprocessing does not happen because the name is now 'matrix_multiproc' and not 'main'
Does this mean I can never use parallel processing on an imported module? All I am trying to do is have my model run as:
def Model_Run():
import Part1, Part2, Part3, matrix_multiproc, Part4
Part1.Run()
Part2.Run()
Part3.Run()
matrix_multiproc.Run()
Part4.Run()
Sorry for a really long question to what is probably a simple answer, thanks!
In this example, at first we import the Process class then initiate Process object with the display() function. Then process is started with start() method and then complete the process with the join() method. We can also pass arguments to the function using args keyword.
You can import multiple functions, variables, etc. from the same module at once by writing them separated by commas. If a line is too long, you can use parentheses () to break the line.
multiprocessing has been distributed as part of the standard library since python 2.6.
Modules can import each other cyclically, but there's a catch. In the simple case, it should work by moving the import statements to the bottom of the file or not using the from syntax.
Does this mean I can never use parallel processing on an imported module?
No, it doesn't. You can use multiprocessing
anywhere in your code, provided that the program's main module uses the if __name__ == '__main__'
guard.
On Unix systems, you won't even need that guard, since it features the fork()
system call to create child processes from the main python
process.
On Windows, on the other hand, fork()
is emulated by multiprocessing
by spawning a new process that runs the main module again, using a different __name__
. Without the guard here, your main application will try to spawn new processes again, resulting in an endless loop, and eating up all your computer's memory pretty fast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With