I use the python 'multiprocessing' module to run single processes on multiple cores but I want to run a couple of independent processes in parallel.
For example, Process-one parses large files, Process-two find patterns in different files and process three does some calculation; can all these three different processed that have different sets of arguments be run in parallel?
def Process1(largefile):
Parse large file
runtime 2hrs
return parsed_file
def Process2(bigfile)
Find pattern in big file
runtime 2.5 hrs
return pattern
def Process3(integer)
Do astronomical calculation
Run time 2.25 hrs
return calculation_results
def FinalProcess(parsed,pattern,calc_results):
Do analysis
Runtime 10 min
return final_results
def main():
parsed = Process1(largefile)
pattern = Process2(bigfile)
calc_res = Process3(integer)
Final = FinalProcess(parsed,pattern,calc_res)
if __name__ == __main__:
main()
sys.exit()
In the above pseudo-code Process1, Process2 and Process3 are single-core processes i.e they can't be run on multiple processors. These processes are run sequentially and take 2+2.5+2.25hrs = 6.75 hrs. Is it possible to run these three processes in parallel? So that they run at the same time on different processors/cores and when most time taking (Process2) finishes than we move to Final Process.
We can also run the same function in parallel with different parameters using the Pool class. For parallel mapping, We have to first initialize multiprocessing. Pool() object. The first argument is the number of workers; if not given, that number will be equal to the number of elements in the system.
Python provides mechanisms for both concurrency and parallelism, each with its own syntax and use cases. Python has two different mechanisms for implementing concurrency, although they share many common components. These are threading and coroutines, or async.
From 16.6.1.5. Using a pool of workers:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
result = pool.apply_async(f, [10]) # evaluate "f(10)" asynchronously
print result.get(timeout=1) # prints "100" unless your computer is *very* slow
print pool.map(f, range(10)) # prints "[0, 1, 4,..., 81]"
You can, therefore, apply_async against a pool and get your results after everything is ready.
from multiprocessing import Pool
# all your methods declarations above go here
# (...)
def main():
pool = Pool(processes=3)
parsed = pool.apply_async(Process1, [largefile])
pattern = pool.apply_async(Process2, [bigfile])
calc_res = pool.apply_async(Process3, [integer])
pool.close()
pool.join()
final = FinalProcess(parsed.get(), pattern.get(), calc_res.get())
# your __main__ handler goes here
# (...)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With