Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Python set more space efficient than list?

Tags:

python

list is known to initialize with a big chunk of space to optimize the time needed to expand the list (on average we don't have to keep making new list like an array).

What about set?

The following construction makes it space wasted because of list. I understand tuple is more space saving because it is immutable. Can we do the same to set and still mutable?

set( [ 1, 2, 3] )

like image 625
User007 Avatar asked Nov 25 '12 02:11

User007


People also ask

Is set more efficient than list in Python?

Lists are slightly faster than sets when you just want to iterate over the values. Sets, however, are significantly faster than lists if you want to check if an item is contained within it. They can only contain unique items though.

Which is more efficient list or set?

Sets use hashing to perform look ups which makes them way faster than lists in this regard. (In the practical example the code using lists took about 45 seconds to run, whereas the code with sets took less than a tenth of a second!)

Are set operations faster in Python?

set is as fast as it gets. However, if you rewrite your code to create the set once, and not change it, you can use the frozenset built-in type. It's exactly the same except immutable. If you're still having speed problems, you need to speed your program up in other ways, such as by using PyPy instead of cPython.

Is set add faster than list append?

The set is far faster, in general. Testing for membership in a list is O(n), linear in the size of the list. Adding to a set is O(1), independent of the number of the items in the list.


2 Answers

>>> from sys import getsizeof as size
>>> s = set(xrange(100))
>>> l = list(xrange(100))
>>> size(s)
8424
>>> size(l)
1016

sets take up more memory than lists. Some of the functionality that sets offer requires more memory (e.g. quick membership tests).

like image 151
arshajii Avatar answered Oct 16 '22 19:10

arshajii


It seems that for small lists, sets can take up 10x as much memory, but in larger lists this goes down to around 3x for some reason. (Maybe from hash table collisions resulting in linked list chaining, which would slow down lookup, but help memory usage?)

In python3, getsizeof(range()) always returns a constant because range objects are sort of like iterators, so I tried testing this out by making the actual lists. EDIT: I could have tried using getsizeof(list(range()))

from sys import getsizeof # returns memory size in bytes
list_lengths = [100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000] # from 100 to 1billion
for length in list_lengths:
     my_list = [num for num in range(length)]
     my_set = set(my_list)
     print("list length: {}".format(len(my_list)))
     print("memory size of my_list: {}".format(getsizeof(my_list)))
     print("memory size of my_set: {}".format(getsizeof(my_set)))
     print("set to list size ratio: {}".format(getsizeof(my_set)/getsizeof(my_list)))


Output:
list length: 100
memory size of my_list: 912
memory size of my_set: 8416
set to list size ratio: 9.228070175438596
list length: 1000
memory size of my_list: 9024
memory size of my_set: 32992
set to list size ratio: 3.6560283687943262
list length: 10000
memory size of my_list: 87624
memory size of my_set: 524512
set to list size ratio: 5.985939925134666
list length: 100000
memory size of my_list: 824464
memory size of my_set: 4194528
set to list size ratio: 5.087581750082478
list length: 1000000
memory size of my_list: 8697464
memory size of my_set: 33554656
set to list size ratio: 3.8579815909557085
list length: 10000000
memory size of my_list: 81528056
memory size of my_set: 268435680
set to list size ratio: 3.2925558779421897
list length: 100000000
memory size of my_list: 859724472
memory size of my_set: 4294967520
set to list size ratio: 4.995748824048828
list length: 1000000000
memory size of my_list: 8058558880
memory size of my_set: 34359738592 # my computer doesn't have this much memory, not sure what's going on here. Maybe it's writing to SSD
set to list size ratio: 4.263757218089594
like image 23
gunit Avatar answered Oct 16 '22 20:10

gunit