Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do i use subprocesses to force python to release memory?

I was reading up on Python Memory Management and would like to reduce the memory footprint of my application. It was suggested that subprocesses would go a long way in mitigating the problem; but i'm having trouble conceptualizing what needs to be done. Could some one please provide a simple example of how to turn this...

def my_function():     x = range(1000000)     y = copy.deepcopy(x)     del x     return y  @subprocess_witchcraft def my_function_dispatcher(*args):     return my_function() 

...into a real subprocessed function that doesn't store an extra "free-list"?

Bonus Question:

Does this "free-list" concept apply to python c-extensions as well?

like image 374
Noob Saibot Avatar asked May 29 '14 15:05

Noob Saibot


People also ask

How do you release memory in Python?

Clear Memory in Python Using the del Statement Along with the gc. collect() method, the del statement can be quite useful to clear memory during Python's program execution. The del statement is used to delete the variable in Python.

Which is the correct way to deallocate memory in Python?

Python's memory allocation and deallocation method is automatic. The user does not have to preallocate or deallocate memory by hand as one has to when using dynamic memory allocation in languages such as C or C++. Python uses two strategies for memory allocation reference counting and garbage collection.

Why does Python not release memory?

Unlike many other languages, Python does not necessarily release the memory back to the Operating System. Instead, it has a dedicated object allocator for objects smaller than 512 bytes, which keeps some chunks of already allocated memory for further use in the future.

How do I give Python more memory?

Python doesn't limit memory usage on your program. It will allocate as much memory as your program needs until your computer is out of memory. The most you can do is reduce the limit to a fixed upper cap. That can be done with the resource module, but it isn't what you're looking for.


1 Answers

The important thing about the optimization suggestion is to make sure that my_function() is only invoked in a subprocess. The deepcopy and del are irrelevant — once you create five million distinct integers in a process, holding onto all of them at the same time, it's game over. Even if you stop referring to those objects, Python will free them by keeping references to five million empty integer-object-sized fields in a limbo where they await reuse for the next function that wants to create five million integers. This is the free list mentioned in the other answer, and it buys blindingly fast allocation and deallocation of ints and floats. It is only fair to Python to note that this is not a memory leak since the memory is definitely made available for further allocations. However, that memory will not get returned to the system until the process ends, nor will it be reused for anything other than allocating numbers of the same type.

Most programs don't have this problem because most programs do not create pathologically huge lists of numbers, free them, and then expect to reuse that memory for other objects. Programs using numpy are also safe because numpy stores numeric data of its arrays in tightly packed native format. For programs that do follow this usage pattern, the way to mitigate the problem is by not creating a large number of the integers at the same time in the first place, at least not in the process which needs to return memory to the system. It is unclear what exact use case you have, but a real-world solution will likely require more than a "magic decorator".

This is where subprocess come in: if the list of numbers is created in another process, then all the memory associated with the list, including but not limited to storage of ints, is both freed and returned to the system by the mere act of terminating the subprocess. Of course, you must design your program so that the list can be both created and processed in the subsystem, without requiring the transfer of all these numbers. The subprocess can receive information needed to create the data set, and can send back the information obtained from processing the list.

To illustrate the principle, let's upgrade your example so that the whole list actually needs to exist - say we're benchmarking sorting algorithms. We want to create a huge list of integers, sort it, and reliably free the memory associated with the list, so that the next benchmark can allocate memory for its own needs without worrying of running out of RAM. To spawn the subprocess and communicate, this uses the multiprocessing module:

# To run this, save it to a file that looks like a valid Python module, e.g. # "foo.py" - multiprocessing requires being able to import the main module. # Then run it with "python foo.py".  import multiprocessing, random, sys, os, time  def create_list(size):     # utility function for clarity - runs in subprocess     maxint = sys.maxint     randrange = random.randrange     return [randrange(maxint) for i in xrange(size)]  def run_test(state):     # this function is run in a separate process     size = state['list_size']     print 'creating a list with %d random elements - this can take a while... ' % size,     sys.stdout.flush()     lst = create_list(size)     print 'done'     t0 = time.time()     lst.sort()     t1 = time.time()     state['time'] = t1 - t0  if __name__ == '__main__':     manager = multiprocessing.Manager()     state = manager.dict(list_size=5*1000*1000)  # shared state     p = multiprocessing.Process(target=run_test, args=(state,))     p.start()     p.join()     print 'time to sort: %.3f' % state['time']     print 'my PID is %d, sleeping for a minute...' % os.getpid()     time.sleep(60)     # at this point you can inspect the running process to see that it     # does not consume excess memory 

Bonus Answer

It is hard to provide an answer to the bonus question, since the question is unclear. The "free list concept" is exactly that, a concept, an implementation strategy that needs to be explicitly coded on top of the regular Python allocator. Most Python types do not use that allocation strategy, for example it is not used for instances of classes created with the class statement. Implementing a free list is not hard, but it is fairly advanced and rarely undertaken without good reason. If some extension author has chosen to use a free list for one of its types, it can be expected that they are aware of the tradeoff a free list offers — gaining extra-fast allocation/deallocation at the cost of some additional space (for the objects on the free list and the free list itself) and inability to reuse the memory for something else.

like image 113
user4815162342 Avatar answered Sep 19 '22 08:09

user4815162342