How do i use subprocesses to force python to release memory?

Tags:

I was reading up on Python Memory Management and would like to reduce the memory footprint of my application. It was suggested that subprocesses would go a long way in mitigating the problem; but i'm having trouble conceptualizing what needs to be done. Could some one please provide a simple example of how to turn this...

def my_function():     x = range(1000000)     y = copy.deepcopy(x)     del x     return y  @subprocess_witchcraft def my_function_dispatcher(*args):     return my_function()

...into a real subprocessed function that doesn't store an extra "free-list"?

Bonus Question:

Does this "free-list" concept apply to python c-extensions as well?

374

asked May 29 '14 15:05

Noob Saibot

1 Answers

The important thing about the optimization suggestion is to make sure that my_function() is only invoked in a subprocess. The deepcopy and del are irrelevant — once you create five million distinct integers in a process, holding onto all of them at the same time, it's game over. Even if you stop referring to those objects, Python will free them by keeping references to five million empty integer-object-sized fields in a limbo where they await reuse for the next function that wants to create five million integers. This is the free list mentioned in the other answer, and it buys blindingly fast allocation and deallocation of ints and floats. It is only fair to Python to note that this is not a memory leak since the memory is definitely made available for further allocations. However, that memory will not get returned to the system until the process ends, nor will it be reused for anything other than allocating numbers of the same type.

Most programs don't have this problem because most programs do not create pathologically huge lists of numbers, free them, and then expect to reuse that memory for other objects. Programs using numpy are also safe because numpy stores numeric data of its arrays in tightly packed native format. For programs that do follow this usage pattern, the way to mitigate the problem is by not creating a large number of the integers at the same time in the first place, at least not in the process which needs to return memory to the system. It is unclear what exact use case you have, but a real-world solution will likely require more than a "magic decorator".

This is where subprocess come in: if the list of numbers is created in another process, then all the memory associated with the list, including but not limited to storage of ints, is both freed and returned to the system by the mere act of terminating the subprocess. Of course, you must design your program so that the list can be both created and processed in the subsystem, without requiring the transfer of all these numbers. The subprocess can receive information needed to create the data set, and can send back the information obtained from processing the list.

To illustrate the principle, let's upgrade your example so that the whole list actually needs to exist - say we're benchmarking sorting algorithms. We want to create a huge list of integers, sort it, and reliably free the memory associated with the list, so that the next benchmark can allocate memory for its own needs without worrying of running out of RAM. To spawn the subprocess and communicate, this uses the multiprocessing module:

# To run this, save it to a file that looks like a valid Python module, e.g. # "foo.py" - multiprocessing requires being able to import the main module. # Then run it with "python foo.py".  import multiprocessing, random, sys, os, time  def create_list(size):     # utility function for clarity - runs in subprocess     maxint = sys.maxint     randrange = random.randrange     return [randrange(maxint) for i in xrange(size)]  def run_test(state):     # this function is run in a separate process     size = state['list_size']     print 'creating a list with %d random elements - this can take a while... ' % size,     sys.stdout.flush()     lst = create_list(size)     print 'done'     t0 = time.time()     lst.sort()     t1 = time.time()     state['time'] = t1 - t0  if __name__ == '__main__':     manager = multiprocessing.Manager()     state = manager.dict(list_size=5*1000*1000)  # shared state     p = multiprocessing.Process(target=run_test, args=(state,))     p.start()     p.join()     print 'time to sort: %.3f' % state['time']     print 'my PID is %d, sleeping for a minute...' % os.getpid()     time.sleep(60)     # at this point you can inspect the running process to see that it     # does not consume excess memory

Bonus Answer

It is hard to provide an answer to the bonus question, since the question is unclear. The "free list concept" is exactly that, a concept, an implementation strategy that needs to be explicitly coded on top of the regular Python allocator. Most Python types do not use that allocation strategy, for example it is not used for instances of classes created with the class statement. Implementing a free list is not hard, but it is fairly advanced and rarely undertaken without good reason. If some extension author has chosen to use a free list for one of its types, it can be expected that they are aware of the tradeoff a free list offers — gaining extra-fast allocation/deallocation at the cost of some additional space (for the objects on the free list and the free list itself) and inability to reuse the memory for something else.

113

answered Sep 19 '22 08:09

user4815162342

Related questions
                            
                                How to detect with python if the string contains html code?
                            
                                How to create a copy of python iterator? [duplicate]
                            
                                Get indices of elements that are greater than a threshold in 2D numpy array
                            
                                What is the best way to map windows drives using Python?
                            
                                Rounding time in Python
                            
                                How to multiply a scalar throughout a specific column within a NumPy array?
                            
                                In Matplotlib, is there a way to know the list of available output format
                            
                                How to slice a deque? [duplicate]
                            
                                How can I turn Django Model objects into a dictionary and still have their foreign keys? [duplicate]
                            
                                Using Celery on processes and gevent in tasks at the same time
                            
                                How to add Python to Windows registry
                            
                                Python: requests.exceptions.ConnectionError. Max retries exceeded with url
                            
                                SKlearn import MLPClassifier fails
                            
                                Tensorflow Dictionary lookup with String tensor
                            
                                How do I define a unique property for a Model in Google App Engine?
                            
                                Generating and applying diffs in python
                            
                                large amount of data in many text files - how to process?
                            
                                How to chain a Celery task that returns a list into a group?
                            
                                Depth error in 2D image with OpenCV Python
                            
                                numpy bytes to plain string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do i use subprocesses to force python to release memory?

Tags:

python

memory-management

subprocess

python-2.7