Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I find out the allocation request that caused my Python MemoryError?

Context

My small Python script uses a library to work on some relatively large data. The standard algorithm for this task is a dynamic programming algorithm, so presumably the library "under the hood" allocates a large array to keep track of the partial results of the DP. Indeed, when I try to give it fairly large input, it immediately gives a MemoryError.

Preferably without digging into the depths of the library, I want to figure out if it is worth trying this algorithm on a different machine with more memory, or trying to trim down a bit on my input size, or if it's a lost cause for the data size I am trying to use.

Question

When my Python code throws a MemoryError, is there a "top-down" way for me to investigate what the size of memory was that my code tried to allocate which caused the error, e.g. by inspecting the error object?

like image 358
Mees de Vries Avatar asked Sep 20 '18 11:09

Mees de Vries


People also ask

How do I fix MemoryError in Python?

To fix this, all you have to do is install the 64-bit version of the Python programming language. A 64-bit computer system can access 2⁶⁴ different memory addresses or 18-Quintillion bytes of RAM. If you have a 64-bit computer system, you must use the 64-bit version of Python to play with its full potential.

What causes memory error in Python?

A MemoryError means that the interpreter has run out of memory to allocate to your Python program. This may be due to an issue in the setup of the Python environment or it may be a concern with the code itself loading too much data at the same time.

What happens when Python runs out of memory?

A segfaulting program might be the symptom of a bug in C code–or it might be that your process is running out of memory. Crashing is just one symptom of running out of memory. Your process might instead just run very slowly, your computer or VM might freeze, or your process might get silently killed.

Who allocates memory in Python?

Python uses a portion of the memory for internal use and non-object memory. Another part of the memory is used for Python object such as int, dict, list, etc. CPython contains the object allocator that allocates memory within the object area. The object allocator gets a call every time the new object needs space.


3 Answers

You can't see from the MemoryError exception, and the exception is raised for any situation where memory allocation failed, including Python internals that do not directly connect to code creating new Python data structures; some modules create locks or other support objects and those operations can fail due to memory having run out.

You also can't necessarily know how much memory would be required to have the whole operation succeed. If the library creates several data structures over the course of operation, trying to allocate memory for a string used as a dictionary key could be the last straw, or it could be copying the whole existing data structure for mutation, or anything in between, but this doesn't say anything about how much memory is going to be needed, in addition, for the remainder of the process.

That said, Python can give you detailed information on what memory allocations are being made, and when, and where, using the tracemalloc module. Using that module and an experimental approach, you could estimate how much memory your data set would require to complete.

The trick is to find data sets for which the process can be completed. You'd want to find data sets of different sizes, and you can then measure how much memory those data structures require. You'd create snapshots before and after with tracemalloc.take_snapshot(), compare differences and statistics between the snapshots for those data sets, and perhaps you can extrapolate from that information how much more memory your larger data set would need. It depends, of course, on the nature of the operation and the datasets, but if there is any kind of pattern tracemalloc is your best shot to discover it.

like image 100
Martijn Pieters Avatar answered Oct 05 '22 06:10

Martijn Pieters


You can see the memory allocation with Pyampler but you will need to add the debugging statements locally in the library that you are using. Assuming a standard PyPi package, here are the steps:

  1. Clone the package locally.

2 Use summary module of Pyampler. Place following inside the main recursion method,

   from pympler import summary
   def data_intensive_method(data_xyz)
       sum1 = summary.summarize(all_objects)
       summary.print_(sum1)
       ...
  1. Run pip install -e . to install the edited package locally.
  2. Run your main program and check the console for memory usage at each iteration.
like image 43
amirathi Avatar answered Oct 05 '22 04:10

amirathi


It appears that MemoryError is not created with any associated data:

def crash():
    x = 32 * 10 ** 9
    return 'a' * x

try:
    crash()
except MemoryError as e:
    print(vars(e))  # prints: {}

This makes sense - how could it if no memory is left?

I don't think there's an easy way out. You can start from the traceback that the MemoryError causes and investigate with a debugger or use a memory profiler like pympler (or psutil as suggested in the comments).

like image 22
roeen30 Avatar answered Oct 05 '22 06:10

roeen30