Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does comparison of a numpy array with a list consume so much memory?

This bit stung me recently. I solved it by removing all comparisons of numpy arrays with lists from the code. But why does the garbage collector miss to collect it?

Run this and watch it eat your memory:

import numpy as np
r = np.random.rand(2)   
l = []
while True:
    r == l

Running on 64bit Ubuntu 10.04, virtualenv 1.7.2, Python 2.7.3, Numpy 1.6.2

like image 452
Hauke Avatar asked Sep 17 '12 14:09

Hauke


People also ask

Are NumPy arrays more memory efficient than lists?

NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.

What are the drawbacks of using Python list when compared to NumPy array?

NumPy uses much less memory to store data The NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code. If this difference seems intimidating then prepare to have more.

Which takes more memory list or array?

array('d', L) then the array will occupy less memory than the list. But not much less. the implementation of a list in python is dynamic i.e it always allocates more memory than the existing number of items in the list actually x2 times. Whereas array is just a wrapper of c and store homogeneous data.

How much faster is NumPy array than list?

As the array size increase, Numpy gets around 30 times faster than Python List. Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster.


1 Answers

Just in case someone stumbles on this and wonders...

@Dugal yes, I believe this is a memory leak in current numpy versions (Sept. 2012) that occurs when some Exceptions are raised (see this and this). Why adding the gc call that @BiRico did "fixes" it seems weird to me, though it must be done right after appearently? Maybe its an oddity with how python garbage collects tracebacks, if someone knows the Exception handling and garbage colleciton CPython Internals, I would be interested.

Workaround: This is not directly related to lists, but for example most broadcasting Exceptions (the empty list does not fit to the arrays size, an empty array results in the same leak. Note that internally there is an Exception prepared that never surfaces). So as a workaround, you should probably just check first if the shape is correct (if you do it a lot, otherwise I wouldn't worry really, this leaks just a small string if I got it right).

FIXED: This issue will be fixed with numpy 1.7.

like image 102
seberg Avatar answered Sep 29 '22 03:09

seberg