Python - Loop parallelisation with joblib

Tags:

I would like some help understanding exactly what I have done/ why my code isn't running as I would expect.

I have started to use joblib to try and speed up my code by running a (large) loop in parallel.

I am using it like so:

from joblib import Parallel, delayed
def frame(indeces, image_pad, m):

    XY_Patches = np.float32(image_pad[indeces[0]:indeces[0]+m, indeces[1]:indeces[1]+m,  indeces[2]])
    XZ_Patches = np.float32(image_pad[indeces[0]:indeces[0]+m, indeces[1],                  indeces[2]:indeces[2]+m])
    YZ_Patches = np.float32(image_pad[indeces[0],                 indeces[1]:indeces[1]+m,  indeces[2]:indeces[2]+m])

    return XY_Patches, XZ_Patches, YZ_Patches


def Patch_triplanar_para(image_path, patch_size):

    Image, Label, indeces =  Sampling(image_path)

    n = (patch_size -1)/2
    m = patch_size

    image_pad = np.pad(Image, pad_width=n, mode='constant', constant_values = 0)

    A = Parallel(n_jobs= 1)(delayed(frame)(i, image_pad, m) for i in indeces)
    A = np.array(A)
    Label = np.float32(Label.reshape(len(Label), 1))
    R, T, Y =  np.hsplit(A, 3)

    return R, T, Y, Label

I have been experimenting with "n_jobs", expecting that increasing this will speed up my function. However as I increase n_jobs, things slow down quite significantly. When running this code without "Parallel", things are slower, until I increase the number of jobs from 1.

Why is this the case? I understood that the more jobs I run, the faster the script? am i using this wrong?

Thanks!

705

asked May 31 '16 12:05

JB1

2 Answers

Maybe your problem is caused because image_pad is a large array. In your code, you are using the default multiprocessing backend of joblib. This backend creates a pool of workers, each of which is a Python process. The input data to the function is then copied n_jobs times and broadcasted to each worker in the pool, which can lead to a serious overhead. Quoting from joblib's docs:

By default the workers of the pool are real Python processes forked using the multiprocessing module of the Python standard library when n_jobs != 1. The arguments passed as input to the Parallel call are serialized and reallocated in the memory of each worker process.

This can be problematic for large arguments as they will be reallocated n_jobs times by the workers.

As this problem can often occur in scientific computing with numpy based datastructures, joblib.Parallel provides a special handling for large arrays to automatically dump them on the filesystem and pass a reference to the worker to open them as memory map on that file using the numpy.memmap subclass of numpy.ndarray. This makes it possible to share a segment of data between all the worker processes.

Note: The following only applies with the default "multiprocessing" backend. If your code can release the GIL, then using backend="threading" is even more efficient.

So if this is your case, you should switch to the threading backend, if you are able to release the global interpreter lock when calling frame, or switch to the shared memory approach of joblib.

The docs say that joblib provides an automated memmap conversion that could be useful.

126

answered Sep 19 '22 22:09

lucianopaz

It's quite possible that the problem you are running up against is a fundamental one to the nature of the python compiler.

If you read "https://www.ibm.com/developerworks/community/blogs/jfp/entry/Python_Is_Not_C?lang=en", you can see from a professional who specialises in optimisation and parallelising python code that iterating through large loops is an inherently slow operation for a python thread to perform. Therefore, spawning more processes that loop through arrays is only going to slow things down.

However - there are things that can be done.

The Cython and Numba compilers are both designed to optimise code that is similar to C/C++ style (i.e. your case) - in particular Numba's new @vectorise decorators allow scalar functions to take in and apply operations on large arrays with large arrays in a parallel manner (target=Parallel).

I don't understand your code enough to give an example of an implementation, but try this! These compilers, used in the correct ways, have brought speed increases of 3000,000% to me for parallel processes in the past!

answered Sep 22 '22 22:09

Isky Mathews

Related questions
                            
                                How do I get this websocket example to work with Flask?
                            
                                quickly summing numpy arrays element-wise
                            
                                change current working directory in python
                            
                                How Can I Add More Languages to Stopwords in NLTK?
                            
                                Python Flask webserver stop responding
                            
                                How do i test fabric tasks
                            
                                Open a chrome extension through Selenium WebDriver using Python
                            
                                Remove NaN 'Cells' without dropping the entire ROW (Pandas,Python3)
                            
                                One-to-many relationships in factory_boy
                            
                                Is there a secure way to use React.js with a Python Flask backend for a multi-user, password protected site
                            
                                Sqlalchemy - how to get raw sql from insert(), update() statements with binded params?
                            
                                Streaming Twitter direct messages
                            
                                GitPython: how to commit updated submodule
                            
                                The submitted data was not a file. Check the encoding type on the form in DRF 3
                            
                                How do you call a python file that requires a command line argument from within another python file?
                            
                                Why can't I pickle a typing.NamedTuple while I can pickle a collections.namedtuple?
                            
                                How can i bulk create in django rest serializer
                            
                                how to speed up PyMC markov model?
                            
                                Imports behave differently when in __init__.py that is imported
                            
                                Python and OpenCV - Improving my lane detection algorithm

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python - Loop parallelisation with joblib

Tags:

python

parallel-processing

numpy

joblib

JB1

People also ask

2 Answers

lucianopaz

Isky Mathews

Recent Activity

Donate For Us