Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What algorithm is used when using the in operator in python to search a list?

When using the 'in' operator to search for an item in a list e.g.

if item in list:
  print item

What algorithm is used to search for this item. Is it a straight search of the list from beginning to end or does it use something like binary search?

like image 623
techshare1337 Avatar asked May 06 '12 05:05

techshare1337


People also ask

What does the IN operator do for lists in Python?

Python's in operator lets you loop through all the members of a collection(such as a list or a tuple) and check if there's a member in the list that's equal to the given item.

How do you search a list in Python?

To find an element in the list, use the Python list index() method, The index() is an inbuilt Python method that searches for an item in the list and returns its index. The index() method finds the given element in the list and returns its position.

What kind of algorithm technique is used by the linear search algorithm in Python?

Searching is a very basic necessity when you store data in different data structures. The simplest approach is to go across every element in the data structure and match it with the value you are searching for. This is known as Linear search.


2 Answers

lists can't be assumed to be in sorted order (or any order at all), so binary search won't work. Nor can the keys be assumed to be hashable, so unlike a dict or set a hash-table lookup can't be used to accelerate the search

At a guess it's a straight-through check of every element from first to last.

I'll try and dig up the relevant Python source code.

--

Edit: The Python list.__contains__() function, which implements the in operator, is defined in listobject.c:

   393 static int
   394 list_contains(PyListObject *a, PyObject *el)
   395 {
   396     Py_ssize_t i;
   397     int cmp;
   398 
   399     for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
   400         cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i),
   401                                            Py_EQ);
   402     return cmp;
   403 }

It iterates over every element in the list, from the first element to the last element (or until it finds a match.) No shortcuts here.

--

Edit 2: The plot thickens. If Python detects that you're testing for membership of an element in a constant list or set, like:

if letter in ['a','e','i','o','u']:    # list version
if letter in {'a','e','i','o','u'}:    # set version

Edit 3 [@JohnMachin]:

The constant list is optimised to a constant tuple in 2.5-2.7 and 3.1-3.3.
The constant set is optimised to a (constant) frozenset in 3.3.

See also @CoryCarson's answer.

like image 170
Li-aung Yip Avatar answered Oct 05 '22 22:10

Li-aung Yip


If list is a literal list, Python 3.2+ will take a faster approach: http://docs.python.org/dev/whatsnew/3.2.html#optimizations

like image 36
Cory Carson Avatar answered Oct 05 '22 23:10

Cory Carson