Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiprocessing a for loop?

I have an array (called data_inputs) containing the names of hundreds of astronomy images files. These images are then manipulated. My code works and takes a few seconds to process each image. However, it can only do one image at a time because I'm running the array through a for loop:

for name in data_inputs:     sci=fits.open(name+'.fits')     #image is manipulated 

There is no reason why I have to modify an image before any other, so is it possible to utilise all 4 cores on my machine with each core running through the for loop on a different image?

I've read about the multiprocessing module but I'm unsure how to implement it in my case. I'm keen to get multiprocessing to work because eventually I'll have to run this on 10,000+ images.

like image 221
ChrisFro Avatar asked Nov 25 '13 11:11

ChrisFro


People also ask

How do you do a multiprocess loop in Python?

To parallelize the loop, we can use the multiprocessing package in Python as it supports creating a child process by the request of another ongoing process. The multiprocessing module could be used instead of the for loop to execute operations on every element of the iterable. It's multiprocessing.

What is multiprocessing in Python?

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.

Can we do multiprocessing in Python?

Python's Global Interpreter Lock (GIL) only allows one thread to be run at a time under the interpreter, which means you can't enjoy the performance benefit of multithreading if the Python interpreter is required. This is what gives multiprocessing an upper hand over threading in Python.

How do you speed up a loop in Python?

A faster way to loop using built-in functions A faster way to loop in Python is using built-in functions. In our example, we could replace the for loop with the sum function. This function will sum the values inside the range of numbers. The code above takes 0.84 seconds.


1 Answers

You can simply use multiprocessing.Pool:

from multiprocessing import Pool  def process_image(name):     sci=fits.open('{}.fits'.format(name))     <process>  if __name__ == '__main__':     pool = Pool()                         # Create a multiprocessing Pool     pool.map(process_image, data_inputs)  # process data_inputs iterable with pool 
like image 103
alko Avatar answered Oct 19 '22 21:10

alko