Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to run a single function in python in parallel for multiple parameters

Suppose I have a single function processing. I want to run the same function multiple times for multiple parameters parallelly instead of sequentially one after the other.

def processing(image_location):
    
    image = rasterio.open(image_location)
    ...
    ...
    return(result)

#calling function serially one after the other with different parameters and saving the results to a variable.
results1 = processing(r'/home/test/image_1.tif')
results2 = processing(r'/home/test/image_2.tif')
results3 = processing(r'/home/test/image_3.tif')

For example, If I run delineation(r'/home/test/image_1.tif') then delineation(r'/home/test/image_2.tif') and then delineation(r'/home/test/image_3.tif'), as shown in the above code, it will run sequentially one after the other and if it takes 5 minutes for one function to run then running these three will take 5x3=15 minutes. Hence, I am wondering if I can run these three parallelly/embarrassingly parallel so that it takes only 5 minutes to execute the function for all the three different parameters.

Help me with the fastest way to do this job. The script should be able to utilize all the resources/CPU/ram available by default to do this task.

like image 647
mArk Avatar asked Sep 11 '20 05:09

mArk


People also ask

How do you run the same function with different parameters parallel?

We can also run the same function in parallel with different parameters using the Pool class. For parallel mapping, We have to first initialize multiprocessing. Pool() object. The first argument is the number of workers; if not given, that number will be equal to the number of elements in the system.

How do I run a parallel single function in Python?

To run functions in parallel with Python, we can use the multiprocessing module. We have func1 and func2 functions that we want to run. Then we use the Process class to create the processes from the functions. Then we call start to start the processes.

Can I run multiple functions in parallel Python?

One of Python's main weaknesses is its inability to have true parallelization due to the Global Interpreter Lock. However, some functions can still virtually run in parallel. Python allows this with two different concepts: multithreading and multiprocessing.

How do you run the same function in Python simultaneously?

The most common way is to use the threading module. Threading allows you to create “threads” that will execute independently of each other. Another way to run multiple functions simultaneously is to use the multiprocessing module, which allows you to create “processes” that will execute independently of each other.

How to run the same function in parallel with different parameters?

# Process.join () to wait for task completion. We can also run the same function in parallel with different parameters using the Pool class. For parallel mapping, We have to first initialize multiprocessing.Pool () object. The first argument is the number of workers; if not given, that number will be equal to the number of elements in the system.

How do I map a function in parallel in Python?

Pool class can be used for parallel execution of a function for different input data. The multiprocessing.Pool () class spawns a set of processes called workers and can submit tasks using the methods apply/apply_async and map/map_async. For parallel mapping, you should first initialize a multiprocessing.Pool () object.

How do I run a for loop in parallel in Python?

Use the asyncio Module to Parallelize the for Loop in Python The asyncio module is single-threaded and runs the event loop by suspending the coroutine temporarily using yield from or await methods. The code below will execute in parallel when it is being called without affecting the main function to wait.

How does asyncio run in parallel?

The asyncio module is single-threaded and runs the event loop by suspending the coroutine temporarily using yield from or await methods. The code below will execute in parallel when it is being called without affecting the main function to wait. The loop also runs in parallel with the main function.


2 Answers

You can use multiprocessing to execute functions in parallel and save results to results variable:

from multiprocessing.pool import ThreadPool

pool = ThreadPool()
images = [r'/home/test/image_1.tif', r'/home/test/image_2.tif', r'/home/test/image_3.tif']
results = pool.map(delineation, images)
like image 189
Alderven Avatar answered Oct 18 '22 20:10

Alderven


You might want to take a look at IPython Parallel. It allows you to easily run functions on a load-balanced (local) cluster.

For this little example, make sure you have IPython Parallel, NumPy and Pillow installed. To run the the example, you need first to launch the cluster. To launch a local cluster with four parallel engines, type into a terminal (one engine for one processor core seems a reasonable choice):

ipcluster 4

Then you can run the following script, which searches for jpg-images in a given directory and counts the number of pixels in each image:

import ipyparallel as ipp


rc = ipp.Client()
with rc[:].sync_imports():  # import on all engines
    import numpy
    from pathlib import Path
    from PIL import Image


lview = rc.load_balanced_view()  # default load-balanced view
lview.block = True  # block until map() is finished


@lview.parallel()
def count_pixels(fn: Path):
    """Silly function to count the number of pixels in an image file"""
    im = Image.open(fn)
    xx = numpy.asarray(im)
    num_pixels = xx.shape[0] * xx.shape[1]
    return fn.stem, num_pixels


pic_dir = Path('Pictures')
fn_lst = pic_dir.glob('*.jpg')  # list all jpg-files in pic_dir

results = count_pixels.map(fn_lst)  # execute in parallel

for n_, cnt in results:
    print(f"'{n_}' has {cnt} pixels.")
like image 23
Dietrich Avatar answered Oct 18 '22 20:10

Dietrich