I have an array (called data_inputs
) containing the names of hundreds of astronomy images files. These images are then manipulated. My code works and takes a few seconds to process each image. However, it can only do one image at a time because I'm running the array through a for
loop:
for name in data_inputs: sci=fits.open(name+'.fits') #image is manipulated
There is no reason why I have to modify an image before any other, so is it possible to utilise all 4 cores on my machine with each core running through the for loop on a different image?
I've read about the multiprocessing
module but I'm unsure how to implement it in my case. I'm keen to get multiprocessing
to work because eventually I'll have to run this on 10,000+ images.
To parallelize the loop, we can use the multiprocessing package in Python as it supports creating a child process by the request of another ongoing process. The multiprocessing module could be used instead of the for loop to execute operations on every element of the iterable. It's multiprocessing.
multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.
Python's Global Interpreter Lock (GIL) only allows one thread to be run at a time under the interpreter, which means you can't enjoy the performance benefit of multithreading if the Python interpreter is required. This is what gives multiprocessing an upper hand over threading in Python.
A faster way to loop using built-in functions A faster way to loop in Python is using built-in functions. In our example, we could replace the for loop with the sum function. This function will sum the values inside the range of numbers. The code above takes 0.84 seconds.
You can simply use multiprocessing.Pool
:
from multiprocessing import Pool def process_image(name): sci=fits.open('{}.fits'.format(name)) <process> if __name__ == '__main__': pool = Pool() # Create a multiprocessing Pool pool.map(process_image, data_inputs) # process data_inputs iterable with pool
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With