So, I have a code which takes in input and starts a spark job in cluster.. So, something like
spark-submit driver.py -i input_path
Now, I have list of paths and I want to execute all these simulatenously..
Here is what I tried
base_command = 'spark-submit driver.py -i %s'
for path in paths:
command = base_command%path
subprocess.Popen(command, shell=True)
My hope was, all of the shell commands would be executed simultaneously but instead, I am noticing that it executes one command at a time..
How do i execute all the bash commands simultaneously. Thanks
This is where pool comes in, it is designed for just this case. It maps many inputs to many threads automatically. Here is a good resource on how to use it.
from multiprocessing import Pool
def run_command(path):
command = "spark-submit driver.py -i {}".format(path)
subprocess.Popen(command, shell=True)
pool = Pool()
pool.map(run_command, paths)
It will create a thread for every item in paths and, run them all at the same time for the given input
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With