Suppose I have a single function <code>processing</code>. I want to run the same function multiple times for multiple parameters parallelly instead of sequentially one after the other. <pre class="prettyprint"><code>def processing(image_location): image = rasterio.open(image_location) ... ... return(result) #calling function serially one after the other with different parameters and saving the results to a variable. results1 = processing(r'/home/test/image_1.tif') results2 = processing(r'/home/test/image_2.tif') results3 = processing(r'/home/test/image_3.tif') </code></pre> For example, If I run <code>delineation(r'/home/test/image_1.tif')</code> then <code>delineation(r'/home/test/image_2.tif')</code> and then <code>delineation(r'/home/test/image_3.tif')</code>, as shown in the above code, it will run sequentially one after the other and if it takes 5 minutes for one function to run then running these three will take 5x3=15 minutes. Hence, I am wondering if I can run these three parallelly/embarrassingly parallel so that it takes only 5 minutes to execute the function for all the three different parameters. Help me with the fastest way to do this job. The script should be able to utilize all the resources/CPU/ram available by default to do this task.

You can use <code>multiprocessing</code> to execute functions in parallel and save results to <code>results</code> variable: <pre class="prettyprint"><code>from multiprocessing.pool import ThreadPool pool = ThreadPool() images = [r'/home/test/image_1.tif', r'/home/test/image_2.tif', r'/home/test/image_3.tif'] results = pool.map(delineation, images) </code></pre>

You might want to take a look at IPython Parallel. It allows you to easily run functions on a load-balanced (local) cluster. For this little example, make sure you have IPython Parallel, NumPy and Pillow installed. To run the the example, you need first to launch the cluster. To launch a local cluster with four parallel engines, type into a terminal (one engine for one processor core seems a reasonable choice): <pre class="prettyprint"><code>ipcluster 4 </code></pre> Then you can run the following script, which searches for jpg-images in a given directory and counts the number of pixels in each image: <pre class="prettyprint"><code>import ipyparallel as ipp rc = ipp.Client() with rc[:].sync_imports(): # import on all engines import numpy from pathlib import Path from PIL import Image lview = rc.load_balanced_view() # default load-balanced view lview.block = True # block until map() is finished @lview.parallel() def count_pixels(fn: Path): """Silly function to count the number of pixels in an image file""" im = Image.open(fn) xx = numpy.asarray(im) num_pixels = xx.shape[0] * xx.shape[1] return fn.stem, num_pixels pic_dir = Path('Pictures') fn_lst = pic_dir.glob('*.jpg') # list all jpg-files in pic_dir results = count_pixels.map(fn_lst) # execute in parallel for n_, cnt in results: print(f"'{n_}' has {cnt} pixels.") </code></pre>

Fastest way to run a single function in python in parallel for multiple parameters

Tags:

python

function

parallel-processing

multiprocessing

embarrassingly-parallel

Suppose I have a single function processing. I want to run the same function multiple times for multiple parameters parallelly instead of sequentially one after the other.

def processing(image_location):
    
    image = rasterio.open(image_location)
    ...
    ...
    return(result)

#calling function serially one after the other with different parameters and saving the results to a variable.
results1 = processing(r'/home/test/image_1.tif')
results2 = processing(r'/home/test/image_2.tif')
results3 = processing(r'/home/test/image_3.tif')

For example, If I run delineation(r'/home/test/image_1.tif') then delineation(r'/home/test/image_2.tif') and then delineation(r'/home/test/image_3.tif'), as shown in the above code, it will run sequentially one after the other and if it takes 5 minutes for one function to run then running these three will take 5x3=15 minutes. Hence, I am wondering if I can run these three parallelly/embarrassingly parallel so that it takes only 5 minutes to execute the function for all the three different parameters.

Help me with the fastest way to do this job. The script should be able to utilize all the resources/CPU/ram available by default to do this task.

647

asked Sep 11 '20 05:09

mArk

2 Answers

You can use multiprocessing to execute functions in parallel and save results to results variable:

from multiprocessing.pool import ThreadPool

pool = ThreadPool()
images = [r'/home/test/image_1.tif', r'/home/test/image_2.tif', r'/home/test/image_3.tif']
results = pool.map(delineation, images)

189

answered Oct 18 '22 20:10

Alderven

You might want to take a look at IPython Parallel. It allows you to easily run functions on a load-balanced (local) cluster.

For this little example, make sure you have IPython Parallel, NumPy and Pillow installed. To run the the example, you need first to launch the cluster. To launch a local cluster with four parallel engines, type into a terminal (one engine for one processor core seems a reasonable choice):

ipcluster 4

Then you can run the following script, which searches for jpg-images in a given directory and counts the number of pixels in each image:

import ipyparallel as ipp


rc = ipp.Client()
with rc[:].sync_imports():  # import on all engines
    import numpy
    from pathlib import Path
    from PIL import Image


lview = rc.load_balanced_view()  # default load-balanced view
lview.block = True  # block until map() is finished


@lview.parallel()
def count_pixels(fn: Path):
    """Silly function to count the number of pixels in an image file"""
    im = Image.open(fn)
    xx = numpy.asarray(im)
    num_pixels = xx.shape[0] * xx.shape[1]
    return fn.stem, num_pixels


pic_dir = Path('Pictures')
fn_lst = pic_dir.glob('*.jpg')  # list all jpg-files in pic_dir

results = count_pixels.map(fn_lst)  # execute in parallel

for n_, cnt in results:
    print(f"'{n_}' has {cnt} pixels.")

answered Oct 18 '22 20:10

Dietrich

Related questions
                            
                                Difference between Python console and Terminal in PyCharm
                            
                                __dict__ Attribute of Ultimate Base Class, object in Python
                            
                                How to show progress bar when we are downloading a file from cloud bucket using python
                            
                                Convert csv into tsv using pandas with escapechar
                            
                                SpyderKernelApp WARNING No such comm
                            
                                Does Ansible expose its auto-discovered Python interpreter?
                            
                                Can you run Google Colab on your local computer?
                            
                                Graphing points on a map but the error code is "ValueError: 'box_aspect' and 'fig_aspect' must be positive"
                            
                                How can I extract text fragments from PDF with their coordinates in Python?
                            
                                "WHY" 2 different executables of python of same version?
                            
                                Verify hostname of the server who invoked the API
                            
                                How determine if a token is part of an entity within Spacy?
                            
                                Conditional filtering of ndarrays
                            
                                Python Callback for File Object Close
                            
                                AttributeError: 'Worksheet' object has no attribute 'set_column'
                            
                                selenium.common.exceptions.SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 85
                            
                                Parse expression with binary and unary operators, reserved words, and without parentheses
                            
                                "requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))"
                            
                                How to clear the conda environment variables?
                            
                                Pandas: Sampling from a DataFrame according to a target distribution

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With