Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between Bytearray and List in Python

I am curious to know how memory management differs between Bytearray and list in Python.

I have found a few questions like Difference between bytearray and list but not exactly answering my question.

My question precisely ...

from array import array
>>> x = array("B", (1,2,3,4))
>>> x.__sizeof__()
36
>>> y = bytearray((1,2,3,4))
>>> y.__sizeof__()
32
>>> z = [1,2,3,4]
>>> z.__sizeof__()
36

As we can see there is a difference in sizes between list/array.array (36 bytes for 4 elements) and a byte array (32 bytes for 4 elements). Can someone explain to me why is this? It makes sense for byte array that it is occupying 32 bytes of memory for 4 elements ( 4 * 8 == 32 ), but how can this be interpreted for list and array.array?

# Lets take the case of bytearray ( which makes more sense to me at least :p)
for i in y:
        print(i, ": ", id(i))

1 :  499962320
2 :  499962336 #diff is 16 units
3 :  499962352 #diff is 16 units
4 :  499962368 #diff is 16 units

Why does the difference between two contiguous elements differ by 16 units here, when each element occupies only 8 bytes. Does that mean each memory address pointer points to a nibble?

Also what is the criteria for memory allocation for an integer? I read that Python will assign more memory based on the value of the integer (correct me if I am wrong) like the larger the number the more memory.

Eg:

>>> y = 10
>>> y.__sizeof__()
14
>>> y = 1000000
>>> y.__sizeof__()
16
>>> y = 10000000000000
>>> y.__sizeof__()
18

what is the criteria that Python allocates memory?

And why Python is occupying so much more memory while C only occupies 8 bytes (mine is a 64 bit machine)? when they are perfectly under the range of integer (2 ** 64) ?

Metadata :

Python version : '3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (Intel)]'

Machine arch : 64-bit

P.S : Kindly guide me to a good article where Python memory management is explained better. I had spent almost an hour to figure these things out and ended up asking this Question in SO. :(

like image 348
Sravan K Ghantasala Avatar asked Oct 23 '15 21:10

Sravan K Ghantasala


People also ask

What is Bytearray in Python?

Python bytearray() Function The bytearray() function returns a bytearray object. It can convert objects into bytearray objects, or create empty bytearray object of the specified size.

What is the main difference between Bytearray and bytes?

The difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified, and bytearray() returns an object that can be modified.

What is the difference between [] and list () in Python?

list is a global name that may be overridden during runtime. list() calls that name. [] is always a list literal.


1 Answers

I'm not claiming this is complete answer, but there are some hints to understanding this.

bytearray is a sequence of bytes and list is a sequence of object references. So [1,2,3] actually holds memory pointers to those integers which are stored in memory elsewhere. To calculate total memory consumption of a list structure, we can do this (I'm using sys.getsizeof everywhere further, it's calling __sizeof__ plus GC overhead)

>>> x = [1,2,3]
>>> sum(map(getsizeof, x)) + getsizeof(x)
172

Result may be different on different machines.

Also, look at this:

>> getsizeof([])
64

That's because lists are mutable. To be fast, this structure allocates some memory range to store references to objects (plus some storage for meta, such as length of the list). When you append items, next memory cells are filled with references to those items. When there are no room to store new items, new, larger range is allocated, existed data copied there and old one released. This called dynamic arrays.

You can observe this behaviour, by running this code.

import sys 
data=[]
n=15
for k in range(n):
    a = len(data)
    b = sys.getsizeof(data)
    print('Length: {0:3d}; Size in bytes: {1:4d}'.format(a, b))
    data.append(None)

My results:

Length:   0; Size in bytes:   64 
Length:   1; Size in bytes:   96
Length:   2; Size in bytes:   96 
Length:   3; Size in bytes:   96
Length:   4; Size in bytes:   96 
Length:   5; Size in bytes:  128
Length:   6; Size in bytes:  128 
Length:   7; Size in bytes:  128
Length:   8; Size in bytes:  128 
Length:   9; Size in bytes:  192
Length:  10; Size in bytes:  192 
Length:  11; Size in bytes:  192
Length:  12; Size in bytes:  192 
Length:  13; Size in bytes:  192
Length:  14; Size in bytes:  192

We can see that there are 64 bytes was used to store 8 memory addresses (64-bit each).

Almost the same goes with bytearray() (change second line to data = bytearray() and append 1 in the last one).

Length:   0; Size in bytes:   56
Length:   1; Size in bytes:   58
Length:   2; Size in bytes:   61
Length:   3; Size in bytes:   61
Length:   4; Size in bytes:   63
Length:   5; Size in bytes:   63
Length:   6; Size in bytes:   65
Length:   7; Size in bytes:   65
Length:   8; Size in bytes:   68
Length:   9; Size in bytes:   68
Length:  10; Size in bytes:   68
Length:  11; Size in bytes:   74
Length:  12; Size in bytes:   74
Length:  13; Size in bytes:   74
Length:  14; Size in bytes:   74

Difference is that memory now used to hold actual byte values, not pointers.

Hope that helps you to investigate further.

like image 125
anti1869 Avatar answered Sep 21 '22 17:09

anti1869