Joblib memory usage keeps growing

Tags:

I have the following problem. My purpose is to process a bunch of documents (bring all words to normal form, e.g. 'was' --> 'be', 'were' --> 'be', 'went' --> 'go'). Which means, I need to open each file in a directory, change its content and save it in the other directory.
Since the process is time-consuming, I decided to parallel it with the help of joblib. The code below works properly (I mean, it performs what it has to), but I faced a huge problem with memory.
It keeps growing constantly!
It grows until there's no memory left on the server at all.

from joblib import delayed, Parallel

def process_text(text):
    # some function which processes
    # text and returns a new text
    return processed_text


def process_and_save(document_id):
    with open(path + document_id) as f:
        text = f.read()
    text = process_text(text)
    f = open(other_path + document_id, 'w')
    f.write(text)
    f.close()

all_doc_ids = # a list of document ids which I need to process

Parallel(n_jobs=10)(delayed(process_and_save)(doc_id) for doc_id in all_doc_ids)

I've also tried to change joblib into multipricessing:

pool = Pool(10)
pool.map(process_and_save, all_doc_ids)

But the situation turned out to be exactly the same.

Are there any ways to solve the problem? And, of course, the main question is, why is this even happening?

Thank you!

P.S. The documents are quite small and the process consumes very little memory when running without parallelism.

644

asked Jun 08 '17 10:06

fremorie

1 Answers

It seem this memory leak issue has been resolved on the last version of Joblib.

They introduce loky backend as memory leaks safeguards.

Parallel(n_jobs=10, backend='loky')(delayed(process_and_save)(doc_id) for doc_id in all_doc_ids)

source: Memory Release after parallel

164

answered Oct 27 '22 05:10

chris

Related questions
                            
                                Use soup.get_text() with UTF-8
                            
                                Writing & Reading the same csv file in Python
                            
                                How many times the finalizer method is called and zombies (PEP 442)
                            
                                Python ctypes: How to pass NULL as argument with format const char **
                            
                                Python opening and reading files one liner
                            
                                NoSuchKey when getting a signed url for a cloudstorage object with a space in the name
                            
                                Rephrase spirograph code into function
                            
                                how to install python-dev with no root?
                            
                                Python urlencode don't encode special characters
                            
                                Converting to Jython a Python 3.5 project - UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 4-10: illegal Unicode character
                            
                                Django URL without trailing slash not working
                            
                                Why is My Minimax Not Expanding and Making Moves Correctly?
                            
                                How to unfocus (blur) Python-gi GTK+3 window on Linux
                            
                                Outlook 2013 display HTML code instead of actual Data
                            
                                Python 2.7 Cx_Freeze: ImportError: No module named __startup__
                            
                                python script crashes after long time running
                            
                                Pyspark 1.6 - Aliasing columns after pivoting with multiple aggregates

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Joblib memory usage keeps growing

Tags:

memory

parallel-processing

python-2.7

pool

joblib

fremorie

People also ask

1 Answers

chris

Recent Activity

Donate For Us