I am trying to understand the internal working of the in
command and index()
of the list data structure.
When I say:
if something not in some_list :
print "do something"
Is it traversing the whole list internally, similar to a for
loop or does it use, better approaches like hashtables
etc.
Also the index()
in lists, gives an error if the item is not present in the list. Is the working of both in
and index()
the same? If index()
is better then is it possible to catch the error when an item is not present and if it is possible, is it good programming?
The index() method returns the index of the given element in the list. If the element is not found, a ValueError exception is raised.
The Python index() method helps you find the index position of an element or an item in a string of characters or a list of items. It spits out the lowest possible index of the specified element in the list. In case the specified item does not exist in the list, a ValueError is returned.
We can unpack list elements to comma-separated variables. Python List can have duplicate elements. They also allow None. List support two operators: + for concatenation and * for repeating the elements.
Note. python lists are 0-indexed. So the first element is 0, second is 1, so on.
Good question! Yes, both methods you mention will iterate the list, necessarily. Python does not use hashtables for lists because there is no restriction that the list elements are hashable.
If you know about "Big O" notation, the list
data structure is designed for O(1) access by looking up a known index, e.g. my_list[13]
. It is O(n) for membership testing.
There are other data structures which are optimised for O(1) speed for membership testing (i.e. __contains__
), namely set
and dict
. These are implemented with hashtables.
Here is an example of how you can use IPython
to verify the time-complexity of sets and lists, to confirm these claims:
In [1]: short_list, long_list = range(1000), range(10000)
In [2]: timeit 'potato' not in short_list
10000 loops, best of 3: 40.9 µs per loop
In [3]: timeit 'potato' not in long_list
1000 loops, best of 3: 440 µs per loop
In [4]: small_set, big_set = set(short_list), set(long_list)
In [5]: timeit 'potato' not in small_set
10000000 loops, best of 3: 72.9 ns per loop
In [6]: timeit 'potato' not in big_set
10000000 loops, best of 3: 84.5 ns per loop
For lists, both methods (in
and index()
) iterate over the list to check for the item you're looking for, unfortunately. They will stop iteration as soon as the result of the membership test is known, which means they will iterate to the end if the item is not found.
As far as I know, if you must work with lists, the not in
construct is the most Python and the one you should go with (but you should dump those unnecessary parentheses).
If you don't need to specifically use a list, the built-in set type can often work in its place. The set is a data structure similar to the list, but it uses a hashing algorithm to test for the presence of an item, so if you're doing a lot of that kind of work, you may consider switching. Read the docs I've linked to though, because sets are unordered, so they don't support things like slicing or indexing.
Yes, you can plan for times when the item you're checking for is not present in your data structure. You're looking for a Try/Except Block:
example_list = [1,2,3]
try:
index_of_4 = example_list.index(4)
except ValueError:
print("Oops! 4 wasn't in the list!")
When you know exceptions may occur in your program, you can wrap the offending code in a block like this to gracefully catch and recover from exceptions. It is indeed good programming practice to recover from errors and exceptions as gracefully as you can, even if that means just printing an error message and exiting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With