Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Profiled performance of len(set) vs. set.__len__() in Python 3 [duplicate]

While profiling my Python's application, I've discovered that len() seems to be a very expensive one when using sets. See the below code:

import cProfile

def lenA(s):
    for i in range(1000000):
        len(s);

def lenB(s):
    for i in range(1000000):
        s.__len__();

def main():
    s = set();
    lenA(s);
    lenB(s);

if __name__ == "__main__":
    cProfile.run("main()","stats");

According to profiler's stats below, lenA() seems to be 14 times slower than lenB():

 ncalls  tottime  percall  cumtime  percall  filename:lineno(function)
      1    1.986    1.986    3.830    3.830  .../lentest.py:5(lenA)
1000000    1.845    0.000    1.845    0.000  {built-in method len}
      1    0.273    0.273    0.273    0.273  .../lentest.py:9(lenB)

Am I missing something? Currently I use __len__() instead of len(), but the code looks dirty :(

like image 611
Tregoreg Avatar asked Jan 08 '12 15:01

Tregoreg


People also ask

Is set more efficient than list Python?

One of the main advantages of using sets in Python is that they are highly optimized for membership tests. For example, sets do membership tests a lot more efficiently than lists.

What does __ len __ mean in Python?

Python len() The len() function returns the number of items (length) in an object.

Is Python set slow?

In Python 2, set is always the slowest. or is the fastest for 2 to 3 literals, and tuple and list are faster with 4 or more literals.

Why Len is not a method?

The fact that len is a function means that classes cannot override this behaviour to avoid the check. As such, len(obj) gives a level of safety that obj. len() cannot.


1 Answers

Obviously, len has some overhead, since it does a function call and translates AttributeError to TypeError. Also, set.__len__ is such a simple operation that it's bound to be very fast in comparison to just about anything, but I still don't find anything like the 14x difference when using timeit:

In [1]: s = set()

In [2]: %timeit s.__len__()
1000000 loops, best of 3: 197 ns per loop

In [3]: %timeit len(s)
10000000 loops, best of 3: 130 ns per loop

You should always just call len, not __len__. If the call to len is the bottleneck in your program, you should rethink its design, e.g. cache sizes somewhere or calculate them without calling len.

like image 115
Fred Foo Avatar answered Sep 26 '22 03:09

Fred Foo