Suppose I have a single function processing
. I want to run the same function multiple times for multiple parameters parallelly instead of sequentially one after the other.
def processing(image_location):
image = rasterio.open(image_location)
...
...
return(result)
#calling function serially one after the other with different parameters and saving the results to a variable.
results1 = processing(r'/home/test/image_1.tif')
results2 = processing(r'/home/test/image_2.tif')
results3 = processing(r'/home/test/image_3.tif')
For example, If I run delineation(r'/home/test/image_1.tif')
then delineation(r'/home/test/image_2.tif')
and then delineation(r'/home/test/image_3.tif')
, as shown in the above code, it will run sequentially one after the other and if it takes 5 minutes for one function to run then running these three will take 5x3=15 minutes. Hence, I am wondering if I can run these three parallelly/embarrassingly parallel so that it takes only 5 minutes to execute the function for all the three different parameters.
Help me with the fastest way to do this job. The script should be able to utilize all the resources/CPU/ram available by default to do this task.
We can also run the same function in parallel with different parameters using the Pool class. For parallel mapping, We have to first initialize multiprocessing. Pool() object. The first argument is the number of workers; if not given, that number will be equal to the number of elements in the system.
To run functions in parallel with Python, we can use the multiprocessing module. We have func1 and func2 functions that we want to run. Then we use the Process class to create the processes from the functions. Then we call start to start the processes.
One of Python's main weaknesses is its inability to have true parallelization due to the Global Interpreter Lock. However, some functions can still virtually run in parallel. Python allows this with two different concepts: multithreading and multiprocessing.
The most common way is to use the threading module. Threading allows you to create “threads” that will execute independently of each other. Another way to run multiple functions simultaneously is to use the multiprocessing module, which allows you to create “processes” that will execute independently of each other.
# Process.join () to wait for task completion. We can also run the same function in parallel with different parameters using the Pool class. For parallel mapping, We have to first initialize multiprocessing.Pool () object. The first argument is the number of workers; if not given, that number will be equal to the number of elements in the system.
Pool class can be used for parallel execution of a function for different input data. The multiprocessing.Pool () class spawns a set of processes called workers and can submit tasks using the methods apply/apply_async and map/map_async. For parallel mapping, you should first initialize a multiprocessing.Pool () object.
Use the asyncio Module to Parallelize the for Loop in Python The asyncio module is single-threaded and runs the event loop by suspending the coroutine temporarily using yield from or await methods. The code below will execute in parallel when it is being called without affecting the main function to wait.
The asyncio module is single-threaded and runs the event loop by suspending the coroutine temporarily using yield from or await methods. The code below will execute in parallel when it is being called without affecting the main function to wait. The loop also runs in parallel with the main function.
You can use multiprocessing
to execute functions in parallel and save results to results
variable:
from multiprocessing.pool import ThreadPool
pool = ThreadPool()
images = [r'/home/test/image_1.tif', r'/home/test/image_2.tif', r'/home/test/image_3.tif']
results = pool.map(delineation, images)
You might want to take a look at IPython Parallel. It allows you to easily run functions on a load-balanced (local) cluster.
For this little example, make sure you have IPython Parallel, NumPy and Pillow installed. To run the the example, you need first to launch the cluster. To launch a local cluster with four parallel engines, type into a terminal (one engine for one processor core seems a reasonable choice):
ipcluster 4
Then you can run the following script, which searches for jpg-images in a given directory and counts the number of pixels in each image:
import ipyparallel as ipp
rc = ipp.Client()
with rc[:].sync_imports(): # import on all engines
import numpy
from pathlib import Path
from PIL import Image
lview = rc.load_balanced_view() # default load-balanced view
lview.block = True # block until map() is finished
@lview.parallel()
def count_pixels(fn: Path):
"""Silly function to count the number of pixels in an image file"""
im = Image.open(fn)
xx = numpy.asarray(im)
num_pixels = xx.shape[0] * xx.shape[1]
return fn.stem, num_pixels
pic_dir = Path('Pictures')
fn_lst = pic_dir.glob('*.jpg') # list all jpg-files in pic_dir
results = count_pixels.map(fn_lst) # execute in parallel
for n_, cnt in results:
print(f"'{n_}' has {cnt} pixels.")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With