Why are __getitem__(key) and get(key) significantly slower than [key]?

Question

It was my understanding that brackets were nothing more than a wrapper for __getitem__. Here is how I benchmarked this:

First, I generated a semi-large dictionary.

items = {}
for i in range(1000000):
    items[i] = 1

Then, I used cProfile to test the following three functions:

def get2(items):
    for k in items.iterkeys():
        items.get(k)

def magic3(items):
    for k in items.iterkeys():
        items.__getitem__(k)

def brackets1(items):
    for k in items.iterkeys():
        items[k]

The results looked like so:

         1000004 function calls in 3.779 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.779    3.779 <string>:1(<module>)
        1    2.135    2.135    3.778    3.778 dict_get_items.py:15(get2)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  1000000    1.644    0.000    1.644    0.000 {method 'get' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'iterkeys' of 'dict' objects}


         1000004 function calls in 3.679 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.679    3.679 <string>:1(<module>)
        1    2.083    2.083    3.679    3.679 dict_get_items.py:19(magic3)
  1000000    1.596    0.000    1.596    0.000 {method '__getitem__' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'iterkeys' of 'dict' objects}


         4 function calls in 0.136 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.136    0.136 <string>:1(<module>)
        1    0.136    0.136    0.136    0.136 dict_get_items.py:11(brackets1)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'iterkeys' of 'dict' objects}

Is the issue in the way I am benchmarking? I tried replacing the bracket access with a simple "pass" to ensure that the data was actually being accessed, and found that "pass" was running much quicker. My interpretation of this was that the data was indeed being accessed. I've also tried appending to a new list, which gave similar results.

Ben · Accepted Answer

First, the disassembly posted by Not_a_Golfer:

>>> d = {1:2}
>>> dis.dis(lambda: d[1])
  1           0 LOAD_GLOBAL              0 (d)
              3 LOAD_CONST               1 (1)
              6 BINARY_SUBSCR       
              7 RETURN_VALUE   

>>> dis.dis(lambda: d.get(1))
  1           0 LOAD_GLOBAL              0 (d)
              3 LOAD_ATTR                1 (get)
              6 LOAD_CONST               1 (1)
              9 CALL_FUNCTION            1
             12 RETURN_VALUE  

>>> dis.dis(lambda: d.__getitem__(1))
  1           0 LOAD_GLOBAL              0 (d)
              3 LOAD_ATTR                1 (__getitem__)
              6 LOAD_CONST               1 (1)
              9 CALL_FUNCTION            1
             12 RETURN_VALUE

Now, getting the benchmarking right is obviously important to read anything into the results, and I don't know enough to help much there. But assuming there really is a difference (which makes sense to me), here's my guesses about why there is:

dict.get simply "does more"; it has to check if the key is present, and if not return its second argument (which defaults to None). This means there's some form of conditional or exception-catching, so I am completely unsurprised that this would have different timing characteristics to the more basic operation of retrieving the value associated with a key.
Python has a specific bytecode for the "subscription" operation (as demonstrated in the disassembly). The builtin types, including dict, are implemented primarily in C and their implementations do not necessarily play by the normal Python rules (only their interfaces are required to, and there are plenty of corner cases even there). So my guess would be that the implementation of the BINARY_SUBSCR opcode goes more-or-less directly to the underlying C implementations of builtin types that support this operation. For these types, I expect that it is actually __getitem__ that exists as a Python-level method to wrap the C implementation, rather than that the bracket syntax invokes the Python-level method.

It might be interesting to benchmark thing.__getitem__(key) against thing[key] for an instance of a custom class that implements __getitem__; you might actually see the opposite results there as the BINARY_SUBSCR op-code would internally have to fall back to doing equivalent work to looking up the method and invoking it.

Why are getitem(key) and get(key) significantly slower than [key]?

Tags:

python

benchmarking

profiling

Dan

1 Answers

Ben

Recent Activity

Donate For Us