Memory usage keep growing with Python's multiprocessing.pool

Tags:

Here's the program:

#!/usr/bin/python  import multiprocessing  def dummy_func(r):     pass  def worker():     pass  if __name__ == '__main__':     pool = multiprocessing.Pool(processes=16)     for index in range(0,100000):         pool.apply_async(worker, callback=dummy_func)      # clean up     pool.close()     pool.join()

I found memory usage (both VIRT and RES) kept growing up till close()/join(), is there any solution to get rid of this? I tried maxtasksperchild with 2.7 but it didn't help either.

I have a more complicated program that calles apply_async() ~6M times, and at ~1.5M point I've already got 6G+ RES, to avoid all other factors, I simplified the program to above version.

EDIT:

Turned out this version works better, thanks for everyone's input:

#!/usr/bin/python  import multiprocessing  ready_list = [] def dummy_func(index):     global ready_list     ready_list.append(index)  def worker(index):     return index  if __name__ == '__main__':     pool = multiprocessing.Pool(processes=16)     result = {}     for index in range(0,1000000):         result[index] = (pool.apply_async(worker, (index,), callback=dummy_func))         for ready in ready_list:             result[ready].wait()             del result[ready]         ready_list = []      # clean up     pool.close()     pool.join()

I didn't put any lock there as I believe main process is single threaded (callback is more or less like a event-driven thing per docs I read).

I changed v1's index range to 1,000,000, same as v2 and did some tests - it's weird to me v2 is even ~10% faster than v1 (33s vs 37s), maybe v1 was doing too many internal list maintenance jobs. v2 is definitely a winner on memory usage, it never went over 300M (VIRT) and 50M (RES), while v1 used to be 370M/120M, the best was 330M/85M. All numbers were just 3~4 times testing, reference only.

971

asked Aug 24 '13 01:08

C.B.

1 Answers

I had memory issues recently, since I was using multiple times the multiprocessing function, so it keep spawning processes, and leaving them in memory.

Here's the solution I'm using now:

def myParallelProcess(ahugearray):     from multiprocessing import Pool     from contextlib import closing     with closing(Pool(15)) as p:         res = p.imap_unordered(simple_matching, ahugearray, 100)     return res

144

answered Oct 02 '22 11:10

deddu

Related questions
                            
                                Spark add new column to dataframe with value from previous row
                            
                                How do pandas Rolling objects work?
                            
                                Python - install script to system
                            
                                Python Dictionary to CSV
                            
                                Django Overriding Model Clean() vs Save()
                            
                                Replicating rows in a pandas data frame by a column value
                            
                                PEP 0492 - Python 3.5 async keyword
                            
                                Python type hinting: how to tell X is a subclass for Foo?
                            
                                Python Code Obfuscation [closed]
                            
                                Python creating a shared variable between threads
                            
                                Is "from matplotlib import pyplot as plt" == "import matplotlib.pyplot as plt"?
                            
                                Relational/Logic Programming in Python?
                            
                                Ctrl-C crashes Python after importing scipy.stats
                            
                                Changing iteration variable inside for loop in Python [duplicate]
                            
                                python pass different **kwargs to multiple functions
                            
                                Tensorflow: How to replace a node in a calculation graph?
                            
                                Pandas groupby with categories with redundant nan
                            
                                Shading an area between two points in a matplotlib plot
                            
                                login() in Django testing framework
                            
                                Why does Python have a format function as well as a format method

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Memory usage keep growing with Python's multiprocessing.pool

Tags:

python

memory

multiprocessing

pool

C.B.

People also ask

1 Answers

deddu

Recent Activity

Donate For Us