I'm currently going through a basic compsci course. We use Python's <code>in</code> a lot. I'm curious how it's implemented, what the code that powers <code>in</code> looks like. I can think of how my implementation of such a thing would work, but something I've learned after turning in a couple homework assignments is that my ways of doing things are usually pretty terrible and inefficient.. So I want to start investigating 'good' code.

The thing about builtin functions and types and operators and so on is that they are not implemented in Python. Rather, they're implemented in C, which is a much more painful and verbose programming language that won't always translate well to Python (usually because things are easier some other way in Python.) With that said, you can investigate all of Python's implementation online, via their public source repository. The implementation for <code>in</code> is scattered -- there's one implementation per type, plus a more general implementation that calls the type-specific implementation (more on that later). For example, for lists, we'd look for the implementation of lists. In the Python source tree, the source for all builtin objects is in the Objects directory. In that directory you'll find listobject.c , which contains the implementation for the list object and all its methods. On the repository at the time of answering, if you look at line 393 you'll find the implementation of the in operator (also known as the <code>__contains__</code> method, which explains the name of the function). It's fairly straightforward, just loops through all the elements of the list until either the element is found, or there's no more elements, and returns the result of the search. :) If it helps, in Python the idiomatic way to write this would be: <pre class="prettyprint"><code>def __contains__(self, obj): for item in self: if item == obj: return True return False </code></pre> <hr> I said earlier that there was a more general implementation. That can be seen in the implementation of <code>PySequence_Contains</code> in <code>abstract.c</code>. It tries to call the type-specific version, and if that fails, resorts to regular iteration. That loop there is what a regular Python for loop looks like when you write it in C (using the Python C-API).

From the Data Model section of the Python Language Reference: <blockquote> The membership test operators (<code>in</code> and <code>not in</code>) are normally implemented as an iteration through a sequence. However, container objects can supply the following special method with a more efficient implementation, which also does not require the object be a sequence. <code>object.__contains__(self, item)</code> Called to implement membership test operators. Should return true if item is in self, false otherwise. For mapping objects, this should consider the keys of the mapping rather than the values or the key-item pairs. For objects that don’t define <code>__contains__()</code>, the membership test first tries iteration via <code>__iter__()</code>, then the old sequence iteration protocol via <code>__getitem__()</code>, see this section in the language reference. </blockquote> So, by default Python iterates over a sequence to implement the <code>in</code> operator. If an object defines the <code>__contains__</code> method, Python uses it instead of iterating. So what happens in the <code>__contains__</code> method? To know exactly, you would have to browse the source. But I can tell you that Python's lists implement <code>__contains__</code> using iteration. Python dictionaries and sets are implemented as hash tables, and therefore support faster membership testing.

How do you investigate python's implementation of built-in methods?

Tags:

python

I'm currently going through a basic compsci course. We use Python's in a lot. I'm curious how it's implemented, what the code that powers in looks like.

I can think of how my implementation of such a thing would work, but something I've learned after turning in a couple homework assignments is that my ways of doing things are usually pretty terrible and inefficient.. So I want to start investigating 'good' code.

845

asked Apr 21 '12 16:04

Zack

2 Answers

The thing about builtin functions and types and operators and so on is that they are not implemented in Python. Rather, they're implemented in C, which is a much more painful and verbose programming language that won't always translate well to Python (usually because things are easier some other way in Python.)

With that said, you can investigate all of Python's implementation online, via their public source repository.

The implementation for in is scattered -- there's one implementation per type, plus a more general implementation that calls the type-specific implementation (more on that later). For example, for lists, we'd look for the implementation of lists. In the Python source tree, the source for all builtin objects is in the Objects directory. In that directory you'll find listobject.c , which contains the implementation for the list object and all its methods.

On the repository at the time of answering, if you look at line 393 you'll find the implementation of the in operator (also known as the __contains__ method, which explains the name of the function). It's fairly straightforward, just loops through all the elements of the list until either the element is found, or there's no more elements, and returns the result of the search. :)

If it helps, in Python the idiomatic way to write this would be:

def __contains__(self, obj):
    for item in self:
        if item == obj:
            return True

    return False

I said earlier that there was a more general implementation. That can be seen in the implementation of PySequence_Contains in abstract.c. It tries to call the type-specific version, and if that fails, resorts to regular iteration. That loop there is what a regular Python for loop looks like when you write it in C (using the Python C-API).

answered Sep 20 '22 23:09

Devin Jeanpierre

From the Data Model section of the Python Language Reference:

The membership test operators (in and not in) are normally implemented as an iteration through a sequence. However, container objects can supply the following special method with a more efficient implementation, which also does not require the object be a sequence.

object.__contains__(self, item)

Called to implement membership test operators. Should return true if item is in self, false otherwise. For mapping objects, this should consider the keys of the mapping rather than the values or the key-item pairs.

For objects that don’t define __contains__(), the membership test first tries iteration via __iter__(), then the old sequence iteration protocol via __getitem__(), see this section in the language reference.

So, by default Python iterates over a sequence to implement the in operator. If an object defines the __contains__ method, Python uses it instead of iterating. So what happens in the __contains__ method? To know exactly, you would have to browse the source. But I can tell you that Python's lists implement __contains__ using iteration. Python dictionaries and sets are implemented as hash tables, and therefore support faster membership testing.

answered Sep 21 '22 23:09

Steven Rumbalski

Related questions
                            
                                how to substract two datetime.time values in django template,and how to format a duration as hour,minutes
                            
                                Embedding Python into C - importing modules
                            
                                How to define url which accept everykind of strings in django
                            
                                Retrieve data from public Google Spreadsheet using gdata library?
                            
                                python find first non-zero digit after decimal point
                            
                                Ipython Emacs integration
                            
                                embedding python
                            
                                Python : Postfix stdin
                            
                                How to decide the language from cookies/headers/session in webapp2?
                            
                                Python 3.2: can't import sqlite3 module
                            
                                MySQL: Order by a function of two columns
                            
                                Matplotlib Agg Rendering Complexity Error
                            
                                using dropbox as a server for my django app
                            
                                Python regex split case insensitive in 2.6
                            
                                Unescaping escaped characters in a string using Python 3.2
                            
                                Accessing the eBay Developer's API through Python? [closed]
                            
                                Update a dict with part of another dict
                            
                                Filter an array in Python3 / Numpy and return indices
                            
                                Pythonic and efficient way of defining multiple regexes for use over many iterations
                            
                                Dividing large numbers in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With