Can I find out the allocation request that caused my Python MemoryError?

Context

My small Python script uses a library to work on some relatively large data. The standard algorithm for this task is a dynamic programming algorithm, so presumably the library "under the hood" allocates a large array to keep track of the partial results of the DP. Indeed, when I try to give it fairly large input, it immediately gives a MemoryError.

Preferably without digging into the depths of the library, I want to figure out if it is worth trying this algorithm on a different machine with more memory, or trying to trim down a bit on my input size, or if it's a lost cause for the data size I am trying to use.

Question

When my Python code throws a MemoryError, is there a "top-down" way for me to investigate what the size of memory was that my code tried to allocate which caused the error, e.g. by inspecting the error object?

358

asked Sep 20 '18 11:09

Mees de Vries

3 Answers

You can't see from the MemoryError exception, and the exception is raised for any situation where memory allocation failed, including Python internals that do not directly connect to code creating new Python data structures; some modules create locks or other support objects and those operations can fail due to memory having run out.

You also can't necessarily know how much memory would be required to have the whole operation succeed. If the library creates several data structures over the course of operation, trying to allocate memory for a string used as a dictionary key could be the last straw, or it could be copying the whole existing data structure for mutation, or anything in between, but this doesn't say anything about how much memory is going to be needed, in addition, for the remainder of the process.

That said, Python can give you detailed information on what memory allocations are being made, and when, and where, using the tracemalloc module. Using that module and an experimental approach, you could estimate how much memory your data set would require to complete.

The trick is to find data sets for which the process can be completed. You'd want to find data sets of different sizes, and you can then measure how much memory those data structures require. You'd create snapshots before and after with tracemalloc.take_snapshot(), compare differences and statistics between the snapshots for those data sets, and perhaps you can extrapolate from that information how much more memory your larger data set would need. It depends, of course, on the nature of the operation and the datasets, but if there is any kind of pattern tracemalloc is your best shot to discover it.

100

answered Oct 05 '22 06:10

Martijn Pieters

You can see the memory allocation with Pyampler but you will need to add the debugging statements locally in the library that you are using. Assuming a standard PyPi package, here are the steps:

Clone the package locally.

2 Use summary module of Pyampler. Place following inside the main recursion method,

   from pympler import summary
   def data_intensive_method(data_xyz)
       sum1 = summary.summarize(all_objects)
       summary.print_(sum1)
       ...

Run pip install -e . to install the edited package locally.
Run your main program and check the console for memory usage at each iteration.

answered Oct 05 '22 04:10

amirathi

It appears that MemoryError is not created with any associated data:

def crash():
    x = 32 * 10 ** 9
    return 'a' * x

try:
    crash()
except MemoryError as e:
    print(vars(e))  # prints: {}

This makes sense - how could it if no memory is left?

I don't think there's an easy way out. You can start from the traceback that the MemoryError causes and investigate with a debugger or use a memory profiler like pympler (or psutil as suggested in the comments).

answered Oct 05 '22 06:10

roeen30

Related questions
                            
                                How to efficiently get count for item in list of lists in python
                            
                                Using web.py as non blocking http-server
                            
                                Autoload in Python
                            
                                How to schedule hundreds of thousands of tasks?
                            
                                Deploying Django (fastcgi, apache mod_wsgi, uwsgi, gunicorn)
                            
                                When where and how can i change the __class__ attr of an object?
                            
                                How do I change a value while debugging python with pdb?
                            
                                How to integrate any Python lint with GitHub commit status API?
                            
                                Autogenerate documentation for Python project using setuptools
                            
                                Get Celery to Use Django Test DB
                            
                                How to mock Python static methods and class methods
                            
                                Sphinx :ivar tag goes looking for cross-references
                            
                                Why does Python not implement the elif statement on try statement?
                            
                                How to get access of individual trees of a xgboost model in python /R
                            
                                Sklearn Pipeline - How to inherit get_params in custom Transformer (not Estimator)
                            
                                Alexa request validation in python
                            
                                Exit pdb Interactive Mode from Jupyter Notebook
                            
                                Transfer learning with tf.estimator.Estimator framework
                            
                                Twine upload TypeError: expected string or bytes-like object
                            
                                How to check if a function was called in a unit test using pytest-mock?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can I find out the allocation request that caused my Python MemoryError?

Tags:

python

python-3.x

error-handling

out-of-memory