My small Python script uses a library to work on some relatively large data. The standard algorithm for this task is a dynamic programming algorithm, so presumably the library "under the hood" allocates a large array to keep track of the partial results of the DP. Indeed, when I try to give it fairly large input, it immediately gives a MemoryError
.
Preferably without digging into the depths of the library, I want to figure out if it is worth trying this algorithm on a different machine with more memory, or trying to trim down a bit on my input size, or if it's a lost cause for the data size I am trying to use.
When my Python code throws a MemoryError
, is there a "top-down" way for me to investigate what the size of memory was that my code tried to allocate which caused the error, e.g. by inspecting the error object?
To fix this, all you have to do is install the 64-bit version of the Python programming language. A 64-bit computer system can access 2⁶⁴ different memory addresses or 18-Quintillion bytes of RAM. If you have a 64-bit computer system, you must use the 64-bit version of Python to play with its full potential.
A MemoryError means that the interpreter has run out of memory to allocate to your Python program. This may be due to an issue in the setup of the Python environment or it may be a concern with the code itself loading too much data at the same time.
A segfaulting program might be the symptom of a bug in C code–or it might be that your process is running out of memory. Crashing is just one symptom of running out of memory. Your process might instead just run very slowly, your computer or VM might freeze, or your process might get silently killed.
Python uses a portion of the memory for internal use and non-object memory. Another part of the memory is used for Python object such as int, dict, list, etc. CPython contains the object allocator that allocates memory within the object area. The object allocator gets a call every time the new object needs space.
You can't see from the MemoryError
exception, and the exception is raised for any situation where memory allocation failed, including Python internals that do not directly connect to code creating new Python data structures; some modules create locks or other support objects and those operations can fail due to memory having run out.
You also can't necessarily know how much memory would be required to have the whole operation succeed. If the library creates several data structures over the course of operation, trying to allocate memory for a string used as a dictionary key could be the last straw, or it could be copying the whole existing data structure for mutation, or anything in between, but this doesn't say anything about how much memory is going to be needed, in addition, for the remainder of the process.
That said, Python can give you detailed information on what memory allocations are being made, and when, and where, using the tracemalloc
module. Using that module and an experimental approach, you could estimate how much memory your data set would require to complete.
The trick is to find data sets for which the process can be completed. You'd want to find data sets of different sizes, and you can then measure how much memory those data structures require. You'd create snapshots before and after with tracemalloc.take_snapshot()
, compare differences and statistics between the snapshots for those data sets, and perhaps you can extrapolate from that information how much more memory your larger data set would need. It depends, of course, on the nature of the operation and the datasets, but if there is any kind of pattern tracemalloc
is your best shot to discover it.
You can see the memory allocation with Pyampler but you will need to add the debugging statements locally in the library that you are using. Assuming a standard PyPi package, here are the steps:
2 Use summary module of Pyampler. Place following inside the main recursion method,
from pympler import summary
def data_intensive_method(data_xyz)
sum1 = summary.summarize(all_objects)
summary.print_(sum1)
...
pip install -e .
to install the edited package locally.It appears that MemoryError
is not created with any associated data:
def crash():
x = 32 * 10 ** 9
return 'a' * x
try:
crash()
except MemoryError as e:
print(vars(e)) # prints: {}
This makes sense - how could it if no memory is left?
I don't think there's an easy way out. You can start from the traceback that the MemoryError
causes and investigate with a debugger or use a memory profiler like pympler (or psutil as suggested in the comments).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With