Please refer to below execution -
import sys
_list = [2,55,87]
print(f'1 - Memory used by Python List - {sys.getsizeof(_list)}')
narray = np.array([2,55,87])
size = narray.size * narray.itemsize
print(f'2 - Memory usage of np array using itemsize - {size}')
print(f'3 - Memory usage of np array using getsizeof - {sys.getsizeof(narray)}')
Here is what I get in result
1 - Memory used by Python List - 80
2 - Memory usage of np array using itemsize - 12
3 - Memory usage of np array using getsizeof - 116
One way of calculation suggests numpy array is consuming way too less memory but other says it is consuming more than regular python list? Shouldn't I be using getSizeOf with numpy array. What I am doing wrong here?
Edit - I just checked, an empty python list is consuming 56 bytes whereas an empty np array 104. Is this space being used in pointing to associated built-in methods and attributes?
As the array size increase, Numpy gets around 30 times faster than Python List. Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster.
Because numpy arrays have shapes, strides, and other member variables that define the data layout it is reasonable that (might) require some extra memory for this! A list on the other hand has no specific type, or shape, etc.
NumPy uses much less memory to store data The NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.
The answer is performance. Numpy data structures perform better in: Size - Numpy data structures take up less space. Performance - they have a need for speed and are faster than lists.
Elements of a list need not be contiguous in memory. Below are some examples which clearly demonstrate how Numpy arrays are better than Python lists by analyzing the memory consumption, execution time comparison, and operations supported by both of them. In this example, a Python list and a Numpy array of size 1000 will be created.
It is open-source, easy to use, memory friendly, and lightning-fast. Originally known as ‘Numeric,’ NumPy sets the framework for many data science libraries like SciPy, Scikit-Learn, Panda, and more. While Python lists store a collection of ordered, alterable data objects, NumPy arrays only store a single type of object.
Size of each element of list in bytes: 48 Size of the whole list in bytes: 48000 Size of each element of the Numpy array in bytes: 8 Size of the whole Numpy array in bytes: 8000 In this example, 2 Python lists and 2 Numpy arrays will be created and each container has 1000000 elements.
Before we discuss a case where NumPy arrays become slow like snails, it is worthwhile to verify the assumption that NumPy arrays are generally faster than lists. To do that, we will calculate the mean of 1 million element array using both NumPy and lists. The array is randomly generated.
The calculation using:
size = narray.size * narray.itemsize
does not include the memory consumed by non-element attributes of the array object. This can be verified by the documentation of ndarray.nbytes
:
>>> x = np.zeros((3,5,2), dtype=np.complex128)
>>> x.nbytes
480
>>> np.prod(x.shape) * x.itemsize
480
In the above link, it can be read that ndarray.nbytes
:
Does not include memory consumed by non-element attributes of the array object.
Note that from the code above you can conclude that your calculation excludes non-element attributes given that the value is equal to the one from ndarray.nbytes
.
A list of the non-element attributes can be found in the section Array Attributes, including here for completeness:
ndarray.flags Information about the memory layout of the array.
ndarray.shape Tuple of array dimensions.
ndarray.strides Tuple of bytes to step in each dimension when traversing an array.
ndarray.ndim Number of array dimensions.
ndarray.data Python buffer object pointing to the start of the array’s data.
ndarray.size Number of elements in the array.
ndarray.itemsize Length of one array element in bytes.
ndarray.nbytes Total bytes consumed by the elements of the array.
ndarray.base Base object if memory is from some other object.
With regards to sys.getsizeof
it can be read in the documentation (emphasis mine) that:
Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
Search on [numpy]getsizeof
produces many potential duplicates.
The basic points are:
a list is a container, and getsizeof
docs warns us that it returns only the size of the container, not the size of the elements that it references. So by itself it is an unreliable measure to the total size of a list (or tuple or dict).
getsizeof
is a fairly good measure of arrays, if you take into account the roughly 100 bytes of "overhead". That overhead will be a big part of a small array, and a minor thing when looking at a large one. nbytes
is the simpler way of judging array memory use.
But for views
, the data-buffer is shared with the base, and doesn't count when using getsizeof
.
object dtype arrays contain references like lists, to the same getsizeof
caution applies.
Overall I think understanding how arrays and lists are stored is more useful way of judging their respective memory use. Focus more on the computational efficiency than memory use. For small stuff, and iterative uses, lists are better. Arrays are best when they are large, and you use array methods to do the calculations.
Because numpy
arrays have shapes, strides, and other member variables that define the data layout it is reasonable that (might) require some extra memory for this!
A list
on the other hand has no specific type, or shape, etc.
Although, if you start appending elements on a list instead of simply writing them as an array, and also go to larger numbers of elements, e.g. 1e7, you will see different behaviour!
Example case:
import numpy as np
import sys
N = int(1e7)
narray = np.zeros(N);
mylist = []
for i in range(N):
mylist.append(narray[i])
print("size of np.array:", sys.getsizeof(narray))
print("size of list :", sys.getsizeof(mylist))
On my (ASUS) Ubuntu 20.04 PC I get:
size of np.array: 80000104
size of list : 81528048
Note that is not only the memory footprint important in an application's efficiency! The data layout is sometimes way more important.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With