Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are nested functions faster than global functions in Python?

I prefer using nested functions instead of methods or global functions in Python whenever possible. So I decided to test their performance because it seams that when you define a function in another function there will be an overhead for the definition of the inner function in each call of the outer function.

At best I was hoping for the global function to be just slightly faster, but surprisingly the nested function was faster. Does anyone have any idea why?

This is my code:

from time import clock

def a(n):
    return n + 1

def b1(loopcount):
    return sum([a(n) for n in range(loopcount)])

def b2(loopcount):
    def a(n):
        return n + 1
    return sum([a(n) for n in range(loopcount)])

powers = [5, 6, 7]
b1times = []
b2times = []
print "   ", "".join(["{:^10d}".format(n) for n in powers])    
for i in range(5):
    for power in powers:
        t = clock()
        b1(10**power)
        b1times.append(clock() - t)
    for power in powers:
        t = clock()
        b2(10**power)
        b2times.append(clock() - t)
    print "b1:", "".join(["{:^10.5f}".format(n) for n in b1times])
    print "b2:", "".join(["{:^10.5f}".format(n) for n in b2times])
    print ""
    b1times = []
    b2times = [] 

And this is the result on my computer:

        5         6         7
b1:  0.08200   0.82773   8.47946
b2:  0.06914   0.79637   8.18571

b1:  0.07332   0.82139   8.68262
b2:  0.06547   0.82088   8.19606

b1:  0.07963   0.82625   9.65037
b2:  0.06617   0.82027   8.21412

b1:  0.07630   0.82112   8.49082
b2:  0.06541   0.80686   8.20532

b1:  0.12328   0.87034   8.42964
b2:  0.07059   0.79717   8.24620

UPDATE: using @Janne Karila's comment

Now that I'm calling b1 and b2 more, b1 becomes faster. So as @Kos and @Pavel Anossov said in their answers a few factors affect the speed here and you can't make a general statement.
Thanks everyone!

from time import *

def a1(n):
    return n + 1

def b1(n):
    return a1(n)

def b2(n):
    def a2():
        return n + 1
    return a2()

powers = [4, 5, 6]
b1times = []
b2times = []
print "   ", "".join(["{:^10d}".format(n) for n in powers])    
for i in range(5):
    for power in powers:
        t = clock()
        sum([b1(n) for n in range(10**power)])
        b1times.append(clock() - t)
    for power in powers:
        t = clock()
        sum([b2(n) for n in range(10**power)])
        b2times.append(clock() - t)
    print "b1:", "".join(["{:^10.5f}".format(n) for n in b1times])
    print "b2:", "".join(["{:^10.5f}".format(n) for n in b2times])
    print ""
    b1times = []
    b2times = [] 
like image 278
nima Avatar asked Jan 02 '13 12:01

nima


2 Answers

There are several parts that have an effect here:

1) The time to define the function (create the function object),
2) The time to look up the function object by name,
3) The time to actually call the function.

Your global function example is faster with 1) (no need to redefine a in each call to b1). However, it is slower with in 2) because global variable lookup is slower than local lookup.

Why can't we have both then?

I've extended your benchmark with a solution that uses the global function, but avoids the global lookup using a local variable. It seems to be the fastest of the three on my machine:

        5         6         7
b1:  0.04147   0.44421   4.46508
b2:  0.03399   0.43321   4.41121
b3:  0.03258   0.41821   4.25542

b1:  0.03240   0.42998   4.39774
b2:  0.03320   0.43465   4.42229
b3:  0.03155   0.42109   4.23669

b1:  0.03273   0.43321   4.37266
b2:  0.03326   0.43551   4.42208
b3:  0.03137   0.42356   4.25341

b1:  0.03253   0.43104   4.40466
b2:  0.03401   0.43719   4.42996
b3:  0.03155   0.41681   4.24132

b1:  0.03244   0.42965   4.37192
b2:  0.03310   0.43629   4.42727
b3:  0.03117   0.41701   4.23932
like image 166
Kos Avatar answered Oct 21 '22 05:10

Kos


LOAD_GLOBAL is much slower than LOAD_FAST.

b1

  5           0 LOAD_GLOBAL              0 (sum)
              3 BUILD_LIST               0
              6 LOAD_GLOBAL              1 (range)
              9 LOAD_FAST                0 (loopcount)
             12 CALL_FUNCTION            1
             15 GET_ITER
        >>   16 FOR_ITER                18 (to 37)
             19 STORE_FAST               1 (n)
             22 LOAD_GLOBAL              2 (a)
             25 LOAD_FAST                1 (n)
             28 CALL_FUNCTION            1
             31 LIST_APPEND              2
             34 JUMP_ABSOLUTE           16
        >>   37 CALL_FUNCTION            1
             40 RETURN_VALUE

b2

  8           0 LOAD_CONST               1 (<code object a at 01E57A40, ...>)
              3 MAKE_FUNCTION            0
              6 STORE_FAST               1 (a)

 10           9 LOAD_GLOBAL              0 (sum)
             12 BUILD_LIST               0
             15 LOAD_GLOBAL              1 (range)
             18 LOAD_FAST                0 (loopcount)
             21 CALL_FUNCTION            1
             24 GET_ITER
        >>   25 FOR_ITER                18 (to 46)
             28 STORE_FAST               2 (n)
             31 LOAD_FAST                1 (a)
             34 LOAD_FAST                2 (n)
             37 CALL_FUNCTION            1
             40 LIST_APPEND              2
             43 JUMP_ABSOLUTE           25
        >>   46 CALL_FUNCTION            1
             49 RETURN_VALUE

Whether the difference makes up for creating a function every time depends on the function (and how many times you use LOAD_GLOBAL).

 

Creating a local reference to a global function to use in an inner loop is a relatively common optimisation:

def a(n):
    return n + 1

def b1(loopcount):
    local_a = a
    return sum([local_a(n) for n in range(loopcount)])

This should still be faster than looking up a global and more maintainable than a nested function.

like image 37
Pavel Anossov Avatar answered Oct 21 '22 03:10

Pavel Anossov