Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does NumPy array really take less memory than python list?

Please refer to below execution -

import sys

_list = [2,55,87]
print(f'1 - Memory used by Python List - {sys.getsizeof(_list)}')
      
narray = np.array([2,55,87])
size = narray.size * narray.itemsize
print(f'2 - Memory usage of np array using itemsize  - {size}')
print(f'3 - Memory usage of np array using getsizeof  - {sys.getsizeof(narray)}')

Here is what I get in result

1 - Memory used by Python List - 80
2 - Memory usage of np array using itemsize  - 12
3 - Memory usage of np array using getsizeof  - 116

One way of calculation suggests numpy array is consuming way too less memory but other says it is consuming more than regular python list? Shouldn't I be using getSizeOf with numpy array. What I am doing wrong here?

Edit - I just checked, an empty python list is consuming 56 bytes whereas an empty np array 104. Is this space being used in pointing to associated built-in methods and attributes?

like image 562
KnowledgeSeeeker Avatar asked Nov 27 '21 07:11

KnowledgeSeeeker


People also ask

Are NumPy arrays more memory efficient than lists?

As the array size increase, Numpy gets around 30 times faster than Python List. Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster.

Do NumPy arrays take up less memory as compared to Python lists?

Because numpy arrays have shapes, strides, and other member variables that define the data layout it is reasonable that (might) require some extra memory for this! A list on the other hand has no specific type, or shape, etc.

Do NumPy arrays use less memory?

NumPy uses much less memory to store data The NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.

Is Python NumPy array better than lists?

The answer is performance. Numpy data structures perform better in: Size - Numpy data structures take up less space. Performance - they have a need for speed and are faster than lists.

Why are NumPy arrays better than Python lists?

Elements of a list need not be contiguous in memory. Below are some examples which clearly demonstrate how Numpy arrays are better than Python lists by analyzing the memory consumption, execution time comparison, and operations supported by both of them. In this example, a Python list and a Numpy array of size 1000 will be created.

What is the difference between Python and NumPy?

It is open-source, easy to use, memory friendly, and lightning-fast. Originally known as ‘Numeric,’ NumPy sets the framework for many data science libraries like SciPy, Scikit-Learn, Panda, and more. While Python lists store a collection of ordered, alterable data objects, NumPy arrays only store a single type of object.

What is the size of a NumPy array in bytes?

Size of each element of list in bytes: 48 Size of the whole list in bytes: 48000 Size of each element of the Numpy array in bytes: 8 Size of the whole Numpy array in bytes: 8000 In this example, 2 Python lists and 2 Numpy arrays will be created and each container has 1000000 elements.

Are NumPy arrays slow like snails?

Before we discuss a case where NumPy arrays become slow like snails, it is worthwhile to verify the assumption that NumPy arrays are generally faster than lists. To do that, we will calculate the mean of 1 million element array using both NumPy and lists. The array is randomly generated.


Video Answer


3 Answers

The calculation using:

size = narray.size * narray.itemsize

does not include the memory consumed by non-element attributes of the array object. This can be verified by the documentation of ndarray.nbytes:

>>> x = np.zeros((3,5,2), dtype=np.complex128)
>>> x.nbytes
480
>>> np.prod(x.shape) * x.itemsize
480

In the above link, it can be read that ndarray.nbytes:

Does not include memory consumed by non-element attributes of the array object.

Note that from the code above you can conclude that your calculation excludes non-element attributes given that the value is equal to the one from ndarray.nbytes.

A list of the non-element attributes can be found in the section Array Attributes, including here for completeness:

ndarray.flags Information about the memory layout of the array.
ndarray.shape Tuple of array dimensions.
ndarray.strides Tuple of bytes to step in each dimension when traversing an array.
ndarray.ndim Number of array dimensions.
ndarray.data Python buffer object pointing to the start of the array’s data.
ndarray.size Number of elements in the array.
ndarray.itemsize Length of one array element in bytes.
ndarray.nbytes Total bytes consumed by the elements of the array.
ndarray.base Base object if memory is from some other object.

With regards to sys.getsizeof it can be read in the documentation (emphasis mine) that:

Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.

like image 102
Dani Mesejo Avatar answered Oct 25 '22 11:10

Dani Mesejo


Search on [numpy]getsizeof produces many potential duplicates.

The basic points are:

  1. a list is a container, and getsizeof docs warns us that it returns only the size of the container, not the size of the elements that it references. So by itself it is an unreliable measure to the total size of a list (or tuple or dict).

  2. getsizeof is a fairly good measure of arrays, if you take into account the roughly 100 bytes of "overhead". That overhead will be a big part of a small array, and a minor thing when looking at a large one. nbytes is the simpler way of judging array memory use.

  3. But for views, the data-buffer is shared with the base, and doesn't count when using getsizeof.

  4. object dtype arrays contain references like lists, to the same getsizeof caution applies.

Overall I think understanding how arrays and lists are stored is more useful way of judging their respective memory use. Focus more on the computational efficiency than memory use. For small stuff, and iterative uses, lists are better. Arrays are best when they are large, and you use array methods to do the calculations.

like image 36
hpaulj Avatar answered Oct 25 '22 11:10

hpaulj


Because numpy arrays have shapes, strides, and other member variables that define the data layout it is reasonable that (might) require some extra memory for this!

A list on the other hand has no specific type, or shape, etc.

Although, if you start appending elements on a list instead of simply writing them as an array, and also go to larger numbers of elements, e.g. 1e7, you will see different behaviour!

Example case:

import numpy as np
import sys

N = int(1e7)

narray = np.zeros(N);
mylist = []

for i in range(N):
    mylist.append(narray[i])

print("size of np.array:", sys.getsizeof(narray))
print("size of list    :", sys.getsizeof(mylist))

On my (ASUS) Ubuntu 20.04 PC I get:

size of np.array: 80000104
size of list    : 81528048

Note that is not only the memory footprint important in an application's efficiency! The data layout is sometimes way more important.

like image 45
Andreas Hadjigeorgiou Avatar answered Oct 25 '22 13:10

Andreas Hadjigeorgiou