I prefer using nested functions instead of methods or global functions in Python whenever possible. So I decided to test their performance because it seams that when you define a function in another function there will be an overhead for the definition of the inner function in each call of the outer function.
At best I was hoping for the global function to be just slightly faster, but surprisingly the nested function was faster. Does anyone have any idea why?
This is my code:
from time import clock
def a(n):
return n + 1
def b1(loopcount):
return sum([a(n) for n in range(loopcount)])
def b2(loopcount):
def a(n):
return n + 1
return sum([a(n) for n in range(loopcount)])
powers = [5, 6, 7]
b1times = []
b2times = []
print " ", "".join(["{:^10d}".format(n) for n in powers])
for i in range(5):
for power in powers:
t = clock()
b1(10**power)
b1times.append(clock() - t)
for power in powers:
t = clock()
b2(10**power)
b2times.append(clock() - t)
print "b1:", "".join(["{:^10.5f}".format(n) for n in b1times])
print "b2:", "".join(["{:^10.5f}".format(n) for n in b2times])
print ""
b1times = []
b2times = []
And this is the result on my computer:
5 6 7
b1: 0.08200 0.82773 8.47946
b2: 0.06914 0.79637 8.18571
b1: 0.07332 0.82139 8.68262
b2: 0.06547 0.82088 8.19606
b1: 0.07963 0.82625 9.65037
b2: 0.06617 0.82027 8.21412
b1: 0.07630 0.82112 8.49082
b2: 0.06541 0.80686 8.20532
b1: 0.12328 0.87034 8.42964
b2: 0.07059 0.79717 8.24620
UPDATE: using @Janne Karila's comment
Now that I'm calling b1 and b2 more, b1 becomes faster. So as @Kos and @Pavel Anossov said in their answers a few factors affect the speed here and you can't make a general statement.
Thanks everyone!
from time import *
def a1(n):
return n + 1
def b1(n):
return a1(n)
def b2(n):
def a2():
return n + 1
return a2()
powers = [4, 5, 6]
b1times = []
b2times = []
print " ", "".join(["{:^10d}".format(n) for n in powers])
for i in range(5):
for power in powers:
t = clock()
sum([b1(n) for n in range(10**power)])
b1times.append(clock() - t)
for power in powers:
t = clock()
sum([b2(n) for n in range(10**power)])
b2times.append(clock() - t)
print "b1:", "".join(["{:^10.5f}".format(n) for n in b1times])
print "b2:", "".join(["{:^10.5f}".format(n) for n in b2times])
print ""
b1times = []
b2times = []
There are several parts that have an effect here:
1) The time to define the function (create the function object),
2) The time to look up the function object by name,
3) The time to actually call the function.
Your global function example is faster with 1) (no need to redefine a
in each call to b1
). However, it is slower with in 2) because global variable lookup is slower than local lookup.
Why can't we have both then?
I've extended your benchmark with a solution that uses the global function, but avoids the global lookup using a local variable. It seems to be the fastest of the three on my machine:
5 6 7
b1: 0.04147 0.44421 4.46508
b2: 0.03399 0.43321 4.41121
b3: 0.03258 0.41821 4.25542
b1: 0.03240 0.42998 4.39774
b2: 0.03320 0.43465 4.42229
b3: 0.03155 0.42109 4.23669
b1: 0.03273 0.43321 4.37266
b2: 0.03326 0.43551 4.42208
b3: 0.03137 0.42356 4.25341
b1: 0.03253 0.43104 4.40466
b2: 0.03401 0.43719 4.42996
b3: 0.03155 0.41681 4.24132
b1: 0.03244 0.42965 4.37192
b2: 0.03310 0.43629 4.42727
b3: 0.03117 0.41701 4.23932
LOAD_GLOBAL is much slower than LOAD_FAST.
5 0 LOAD_GLOBAL 0 (sum)
3 BUILD_LIST 0
6 LOAD_GLOBAL 1 (range)
9 LOAD_FAST 0 (loopcount)
12 CALL_FUNCTION 1
15 GET_ITER
>> 16 FOR_ITER 18 (to 37)
19 STORE_FAST 1 (n)
22 LOAD_GLOBAL 2 (a)
25 LOAD_FAST 1 (n)
28 CALL_FUNCTION 1
31 LIST_APPEND 2
34 JUMP_ABSOLUTE 16
>> 37 CALL_FUNCTION 1
40 RETURN_VALUE
8 0 LOAD_CONST 1 (<code object a at 01E57A40, ...>)
3 MAKE_FUNCTION 0
6 STORE_FAST 1 (a)
10 9 LOAD_GLOBAL 0 (sum)
12 BUILD_LIST 0
15 LOAD_GLOBAL 1 (range)
18 LOAD_FAST 0 (loopcount)
21 CALL_FUNCTION 1
24 GET_ITER
>> 25 FOR_ITER 18 (to 46)
28 STORE_FAST 2 (n)
31 LOAD_FAST 1 (a)
34 LOAD_FAST 2 (n)
37 CALL_FUNCTION 1
40 LIST_APPEND 2
43 JUMP_ABSOLUTE 25
>> 46 CALL_FUNCTION 1
49 RETURN_VALUE
Whether the difference makes up for creating a function every time depends on the function (and how many times you use LOAD_GLOBAL).
Creating a local reference to a global function to use in an inner loop is a relatively common optimisation:
def a(n):
return n + 1
def b1(loopcount):
local_a = a
return sum([local_a(n) for n in range(loopcount)])
This should still be faster than looking up a global and more maintainable than a nested function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With