>>> from sys import getsizeof
>>> a=[i for i in range(1000)]
>>> b={i for i in range(1000)}
>>> getsizeof(a)
9024
>>> getsizeof(b)
32992
My question is, why does a set consume so much more memory compared to a list? Lists are ordered, sets are not. Is it an internal structure of a set that consumes memory? Or does a list contain pointers and set does not? Or maybe sys.getsizeof
is wrong here? I've seen questions about tuples, lists and dictionaries, but I could not find any comparison between lists and sets.
Those numbers can easily fit in a 64-bit integer, so one would hope Python would store those million integers in no more than ~8MB: a million 8-byte objects. In fact, Python uses more like 35MB of RAM to store these numbers.
When you create a list object, the list object by itself takes 64 bytes of memory, and each item adds 8 bytes of memory to the size of the list because of references to other objects.
Generally the lists are faster than sets. But in the case of searching for an element in a collection, sets are faster because sets have been implemented using hash tables. So basically Python does not have to search the full set, which means that the time complexity in average is O(1).
They take up slightly less memory and are faster to access. They aren't as flexible but are more efficient than lists. Their normal use is to serve as dictionary keys. Sets are also sequence structures but with two differences from lists and tuples.
I think it's because of the inherent difference between list
and set
or dict
i.e. the way in which the elements are stored.
List
is nothing but a collection of references to the original object. Suppose you create 1000 integers, then 1000 integer objects are created and the list
only contains the reference to these objects.
On the other hand, set
or dictionary
has to compute the hash value for these 1000 integers and the memory is consumed according to the number of elements.
For ex: In both set
and dict
, by default, the smallest size is 8 (that is, if you are only storing 3 values, python will still allocate 8 elements). On resize, the number of buckets increases by 4x until we reach 50,000 elements, after which the size is increased by 2x. This gives the following possible sizes,
16, 64, 256, 1024, 4096, 16384, 65536, 131072, 262144, ...
Some examples:
In [26]: a=[i for i in range(60000)]
In [27]: b={i for i in range(60000)}
In [30]: b1={i for i in range(100000)}
In [31]: a1=[i for i in range(100000)]
In [32]: getsizeof(a)
Out[32]: 514568
In [33]: getsizeof(b)
Out[33]: 2097376
In [34]: getsizeof(a1)
Out[34]: 824464
In [35]: getsizeof(b1)
Out[35]: 4194528
Answers:
Yes, it's the internal structure in the way set
stores the elements consumes this much memory. And, sys.getsizeof
is correct only; There's nothing wrong with using that here.
For more detailed reference about list
, set
or dict
please refer this chapter: High Performance Python
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With