The set is far faster, in general. Testing for membership in a list is O(n), linear in the size of the list. Adding to a set is O(1), independent of the number of the items in the list.
Lists are allocated in two blocks: the fixed one with all the Python object information and a variable sized block for the data. It is the reason creating a tuple is faster than List. It also explains the slight difference in indexing speed is faster than lists, because in tuples for indexing it follows fewer pointers.
Use Tuple. If your data should or does not need to be changed. Tuples are faster than lists.
set is as fast as it gets. However, if you rewrite your code to create the set once, and not change it, you can use the frozenset built-in type. It's exactly the same except immutable.
list
: Imagine you are looking for your socks in your closet, but you don't know in which drawer your socks are, so you have to search drawer by drawer until you find them (or maybe you never do). That's what we call O(n)
, because in the worst scenario, you will look in all your drawers (where n
is the number of drawers).
set
: Now, imagine you're still looking for your socks in your closet, but now you know in which drawer your socks are, say in the 3rd drawer. So, you will just search in the 3rd drawer, instead of searching in all drawers. That's what we call O(1)
, because in the worst scenario you will look in just one drawer.
Sets are implemented using hash tables. Whenever you add an object to a set, the position within the memory of the set
object is determined using the hash of the object to be added. When testing for membership, all that needs to be done is basically to look if the object is at the position determined by its hash, so the speed of this operation does not depend on the size of the set. For lists, in contrast, the whole list needs to be searched, which will become slower as the list grows.
This is also the reason that sets do not preserve the order of the objects you add.
Note that sets aren't faster than lists in general -- membership test is faster for sets, and so is removing an element. As long as you don't need these operations, lists are often faster.
I think you need to take a good look at a book on data structures. Basically, Python lists are implemented as dynamic arrays and sets are implemented as a hash tables.
The implementation of these data structures gives them radically different characteristics. For instance, a hash table has a very fast lookup time but cannot preserve the order of insertion.
Python uses hashtables, which have O(1) lookup.
While I have not measured anything performance related in python so far, I'd still like to point out that lists are often faster.
Yes, you have O(1) vs. O(n). But always remember that this gives information only about the asymptotic behavior of something. That means if your n is very high O(1) will always be faster - theoretically. In practice however n often needs to be much bigger than your usual data set will be.
So sets are not faster than lists per se, but only if you have to handle a lot of elements.
Basically, Depends on the operation you are doing …
*For adding an element - then a set doesn’t need to move any data, and all it needs to do is calculate a hash value and add it to a table. For a list insertion then potentially there is data to be moved.
*For deleting an element - all a set needs to do is remove the hash entry from the hash table, for a list it potentially needs to move data around (on average 1/2 of the data.
*For a search (i.e. an in operator) - a set just needs to calculate the hash value of the data item, find that hash value in the hash table, and if it is there - then bingo. For a list, the search has to look up each item in turn - on average 1/2 of all of the terms in the list. Even for many 1000s of items a set will be far quicker to search.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With