list
is known to initialize with a big chunk of space to optimize the time needed to expand the list (on average we don't have to keep making new list like an array).
What about set
?
The following construction makes it space wasted because of list
. I understand tuple
is more space saving because it is immutable. Can we do the same to set
and still mutable?
set( [ 1, 2, 3] )
Lists are slightly faster than sets when you just want to iterate over the values. Sets, however, are significantly faster than lists if you want to check if an item is contained within it. They can only contain unique items though.
Sets use hashing to perform look ups which makes them way faster than lists in this regard. (In the practical example the code using lists took about 45 seconds to run, whereas the code with sets took less than a tenth of a second!)
set is as fast as it gets. However, if you rewrite your code to create the set once, and not change it, you can use the frozenset built-in type. It's exactly the same except immutable. If you're still having speed problems, you need to speed your program up in other ways, such as by using PyPy instead of cPython.
The set is far faster, in general. Testing for membership in a list is O(n), linear in the size of the list. Adding to a set is O(1), independent of the number of the items in the list.
>>> from sys import getsizeof as size
>>> s = set(xrange(100))
>>> l = list(xrange(100))
>>> size(s)
8424
>>> size(l)
1016
set
s take up more memory than list
s. Some of the functionality that set
s offer requires more memory (e.g. quick membership tests).
It seems that for small lists, sets can take up 10x as much memory, but in larger lists this goes down to around 3x for some reason. (Maybe from hash table collisions resulting in linked list chaining, which would slow down lookup, but help memory usage?)
In python3, getsizeof(range()) always returns a constant because range objects are sort of like iterators, so I tried testing this out by making the actual lists. EDIT: I could have tried using getsizeof(list(range()))
from sys import getsizeof # returns memory size in bytes
list_lengths = [100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000] # from 100 to 1billion
for length in list_lengths:
my_list = [num for num in range(length)]
my_set = set(my_list)
print("list length: {}".format(len(my_list)))
print("memory size of my_list: {}".format(getsizeof(my_list)))
print("memory size of my_set: {}".format(getsizeof(my_set)))
print("set to list size ratio: {}".format(getsizeof(my_set)/getsizeof(my_list)))
Output:
list length: 100
memory size of my_list: 912
memory size of my_set: 8416
set to list size ratio: 9.228070175438596
list length: 1000
memory size of my_list: 9024
memory size of my_set: 32992
set to list size ratio: 3.6560283687943262
list length: 10000
memory size of my_list: 87624
memory size of my_set: 524512
set to list size ratio: 5.985939925134666
list length: 100000
memory size of my_list: 824464
memory size of my_set: 4194528
set to list size ratio: 5.087581750082478
list length: 1000000
memory size of my_list: 8697464
memory size of my_set: 33554656
set to list size ratio: 3.8579815909557085
list length: 10000000
memory size of my_list: 81528056
memory size of my_set: 268435680
set to list size ratio: 3.2925558779421897
list length: 100000000
memory size of my_list: 859724472
memory size of my_set: 4294967520
set to list size ratio: 4.995748824048828
list length: 1000000000
memory size of my_list: 8058558880
memory size of my_set: 34359738592 # my computer doesn't have this much memory, not sure what's going on here. Maybe it's writing to SSD
set to list size ratio: 4.263757218089594
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With