I am trying to understand the internal working of the <code>in</code> command and <code>index()</code> of the list data structure. When I say: <pre class="prettyprint"><code>if something not in some_list : print "do something" </code></pre> Is it traversing the whole list internally, similar to a <code>for</code> loop or does it use, better approaches like <code>hashtables</code> etc. Also the <code>index()</code> in lists, gives an error if the item is not present in the list. Is the working of both <code>in</code> and <code>index()</code> the same? If <code>index()</code> is better then is it possible to catch the error when an item is not present and if it is possible, is it good programming?

Good question! Yes, both methods you mention will iterate the list, necessarily. Python does not use hashtables for lists because there is no restriction that the list elements are hashable. If you know about "Big O" notation, the <code>list</code> data structure is designed for O(1) access by looking up a known index, e.g. <code>my_list[13]</code>. It is O(n) for membership testing. There are other data structures which are optimised for O(1) speed for membership testing (i.e. <code>__contains__</code>), namely <code>set</code> and <code>dict</code>. These are implemented with hashtables. Here is an example of how you can use <code>IPython</code> to verify the time-complexity of sets and lists, to confirm these claims: <pre class="prettyprint"><code>In [1]: short_list, long_list = range(1000), range(10000) In [2]: timeit 'potato' not in short_list 10000 loops, best of 3: 40.9 µs per loop In [3]: timeit 'potato' not in long_list 1000 loops, best of 3: 440 µs per loop In [4]: small_set, big_set = set(short_list), set(long_list) In [5]: timeit 'potato' not in small_set 10000000 loops, best of 3: 72.9 ns per loop In [6]: timeit 'potato' not in big_set 10000000 loops, best of 3: 84.5 ns per loop </code></pre>

in and index function of list [Python]

Tags:

python

list

python-3.x

data-structures

python-2.7

I am trying to understand the internal working of the in command and index() of the list data structure.

When I say:

if something not in some_list :
    print "do something"

Is it traversing the whole list internally, similar to a for loop or does it use, better approaches like hashtables etc.

Also the index() in lists, gives an error if the item is not present in the list. Is the working of both in and index() the same? If index() is better then is it possible to catch the error when an item is not present and if it is possible, is it good programming?

388

asked Sep 22 '14 23:09

KKa

2 Answers

Good question! Yes, both methods you mention will iterate the list, necessarily. Python does not use hashtables for lists because there is no restriction that the list elements are hashable.

If you know about "Big O" notation, the list data structure is designed for O(1) access by looking up a known index, e.g. my_list[13]. It is O(n) for membership testing.

There are other data structures which are optimised for O(1) speed for membership testing (i.e. __contains__), namely set and dict. These are implemented with hashtables.

Here is an example of how you can use IPython to verify the time-complexity of sets and lists, to confirm these claims:

In [1]: short_list, long_list = range(1000), range(10000)

In [2]: timeit 'potato' not in short_list
10000 loops, best of 3: 40.9 µs per loop

In [3]: timeit 'potato' not in long_list
1000 loops, best of 3: 440 µs per loop

In [4]: small_set, big_set = set(short_list), set(long_list)

In [5]: timeit 'potato' not in small_set
10000000 loops, best of 3: 72.9 ns per loop

In [6]: timeit 'potato' not in big_set
10000000 loops, best of 3: 84.5 ns per loop

180

answered Oct 16 '22 16:10

wim

For lists, both methods (in and index()) iterate over the list to check for the item you're looking for, unfortunately. They will stop iteration as soon as the result of the membership test is known, which means they will iterate to the end if the item is not found.

As far as I know, if you must work with lists, the not in construct is the most Python and the one you should go with (but you should dump those unnecessary parentheses).

If you don't need to specifically use a list, the built-in set type can often work in its place. The set is a data structure similar to the list, but it uses a hashing algorithm to test for the presence of an item, so if you're doing a lot of that kind of work, you may consider switching. Read the docs I've linked to though, because sets are unordered, so they don't support things like slicing or indexing.

Yes, you can plan for times when the item you're checking for is not present in your data structure. You're looking for a Try/Except Block:

example_list = [1,2,3]

try:
     index_of_4 = example_list.index(4)
except ValueError:
     print("Oops! 4 wasn't in the list!")

When you know exceptions may occur in your program, you can wrap the offending code in a block like this to gracefully catch and recover from exceptions. It is indeed good programming practice to recover from errors and exceptions as gracefully as you can, even if that means just printing an error message and exiting.

answered Oct 16 '22 16:10

skrrgwasme

Related questions
                            
                                pyparsing performance and memory usage
                            
                                subprocess.Popen using relative paths
                            
                                SVD - Matrix transformation Python
                            
                                Regex not start with dot or end with dot
                            
                                Tkinter: windows without title bar but resizable
                            
                                gdb python api: is it possible to make a call to a class/struct method
                            
                                Optimizing strings in Cython
                            
                                Removing Ttk Notebook Tab Dashed Line
                            
                                The smallest python distribtion to run Sympy, Scipy, Numpy and Matplotlib
                            
                                Finding Two-Tailed P Value from t-distribution and Degrees of Freedom in Python
                            
                                In Python, what is the difference between f.readlines() and list(f)
                            
                                WTForm: FieldList with SelectField, how do I render?
                            
                                OpenCV for Python - AttributeError: 'module' object has no attribute 'connectedComponents'
                            
                                Yet Another 'Connection reset by peer' Error
                            
                                Memory leak in Pandas.groupby.apply()?
                            
                                How to allow anonymous uploads to cloud storage
                            
                                elasticsearch exceptions ConnectionError
                            
                                Colon in file names in Python
                            
                                Complexity of Python Sort Method [duplicate]
                            
                                why python using redis is so slow?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With