On a whim, I recently tested these two methods with timeit
, to see which evaluation method was faster:
import timeit """Test method returns True if either argument is falsey, else False.""" def and_chk((a, b)): if not (a and b): return True return False def not_or_chk((a, b)): if not a or not b: return True return False
...and got these results:
VALUES FOR a,b -> 0,0 0,1 1,0 1,1 method and_chk(a,b) 0.95559 0.98646 0.95138 0.98788 not_or_chk(a,b) 0.96804 1.07323 0.96015 1.05874 ...seconds per 1,111,111 cycles.
The difference in efficiency is between one and nine percent, always in favour of if not (a and b)
, which is the opposite of what I might expect since I understand that if not a or not b
will evaluate its terms (if not a
and then if not b
) in order, running the if
block once it encounters a true expression (and there are no and
clauses). In contrast, the and_chk
method needs to evaluate both clauses before it can return any result to the if not..
that wraps it.
The timing results, however, disprove this understanding. How, then, is the if
condition being evaluated? I am perfectly aware of the fact that this degree of microoptimization is practically, if not completely, pointless. I just want to understand how Python is going about it.
For completion's sake, this is how I set up timeit
...
cyc = 1111111 bothFalse_and = iter([(0,0)] * cyc) zeroTrue_and = iter([(1,0)] * cyc) oneTrue_and = iter([(0,1)] * cyc) bothTrue_and = iter([(1,1)] * cyc) bothFalse_notor = iter([(0,0)] * cyc) zeroTrue_notor = iter([(1,0)] * cyc) oneTrue_notor = iter([(0,1)] * cyc) bothTrue_notor = iter([(1,1)] * cyc) time_bothFalse_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import bothFalse_and as tups, and_chk') time_zeroTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import zeroTrue_and as tups, and_chk') time_oneTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import oneTrue_and as tups, and_chk') time_bothTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import bothTrue_and as tups, and_chk') time_bothFalse_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import bothFalse_notor as tups, not_or_chk') time_zeroTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import zeroTrue_notor as tups, not_or_chk') time_oneTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import oneTrue_notor as tups, not_or_chk') time_bothTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import bothTrue_notor as tups, not_or_chk')
...then ran each timeit.Timer(..)
function with .timeit(cyc)
to get the results posted.
The not_or_chk
function requires two unary operations in addition to two jumps (in the worst case), while the and_chk
function only has the two jumps (again, in the worst case).
The dis module to the rescue! The dis
module lets you take a look at the Python bytecode disassembly of your code. For example:
import dis """Test method returns True if either argument is falsey, else False.""" def and_chk((a, b)): if not (a and b): return True return False def not_or_chk((a, b)): if not a or not b: return True return False print("And Check:\n") print(dis.dis(and_chk)) print("Or Check:\n") print(dis.dis(not_or_chk))
Produces this output:
And Check: 5 0 LOAD_FAST 0 (.0) 3 UNPACK_SEQUENCE 2 6 STORE_FAST 1 (a) 9 STORE_FAST 2 (b) 6 12 LOAD_FAST 1 (a) * This block is the * 15 JUMP_IF_FALSE_OR_POP 21 * disassembly of * 18 LOAD_FAST 2 (b) * the "and_chk" * >> 21 POP_JUMP_IF_TRUE 28 * function * 7 24 LOAD_GLOBAL 0 (True) 27 RETURN_VALUE 8 >> 28 LOAD_GLOBAL 1 (False) 31 RETURN_VALUE None Or Check: 10 0 LOAD_FAST 0 (.0) 3 UNPACK_SEQUENCE 2 6 STORE_FAST 1 (a) 9 STORE_FAST 2 (b) 11 12 LOAD_FAST 1 (a) * This block is the * 15 UNARY_NOT * disassembly of * 16 POP_JUMP_IF_TRUE 26 * the "not_or_chk" * 19 LOAD_FAST 2 (b) * function * 22 UNARY_NOT 23 POP_JUMP_IF_FALSE 30 12 >> 26 LOAD_GLOBAL 0 (True) 29 RETURN_VALUE 13 >> 30 LOAD_GLOBAL 1 (False) 33 RETURN_VALUE None
Take a look at the two blocks of Python bytecode that I've marked with the asterisks. Those blocks are your two disassembled functions. Note that and_chk
only has two jumps, and the calculations in the function are made while deciding whether or not to take the jump.
On the other hand, the not_or_chk
function requires the not
operation to be carried out twice in the worst case, in addition to the interpreter deciding whether or not to take the jump.
I just noticed this question via your Meta SO question: Is it appropriate to share the results of my research toward solving my own minor questions?. As you mention in that question (and one of the tags on this question indicates), this sort of thing falls into the category of micro-optimization. Ideally, one shouldn't need to worry about such minor performance differences, and as Knuth says, premature optimization is the root of all evil. But I guess it is fun and instructive to investigate such matters, as it can give you a better feel for how Python works "under the hood".
Mephy's comment prompted me to see what the timing differences were for if
-less versions of your functions. The results are interesting, IMHO. I also took the opportunity to streamline your testing procedure.
#!/usr/bin/env python ''' Do timeit tests on various implementations of NAND NAND returns True if either argument is falsey, else False. From https://stackoverflow.com/q/29551438/4014959 Written by PM 2Ring 2015.04.09 ''' from timeit import Timer import dis def and_chk(a, b): return not (a and b) def and_chk_if(a, b): if not (a and b): return True else: return False def not_or_chk(a, b): return not a or not b def not_or_chk_if(a, b): if not a or not b: return True else: return False #All the functions funcs = ( and_chk, and_chk_if, not_or_chk, not_or_chk_if, ) #Argument tuples to test the functions with bools = (0, 1) bool_tups = [(u, v) for u in bools for v in bools] def show_dis(): ''' Show the disassembly for each function ''' print 'Disassembly' for func in funcs: fname = func.func_name print '\n%s' % fname dis.dis(func) print def verify(): ''' Verify that the functions actually perform as intended ''' print 'Verifying...' for func in funcs: fname = func.func_name print '\n%s' % fname for args in bool_tups: print args, func(*args) print def time_test(loops, reps): ''' Print timing stats for all the functions ''' print 'Timing tests\nLoops = %d, Repetitions = %d' % (loops, reps) for func in funcs: fname = func.func_name print '\n%s' % fname setup = 'from __main__ import %s' % fname for args in bool_tups: t = Timer('%s%s' % (fname, args), setup) r = t.repeat(reps, loops) r.sort() print args, r show_dis() verify() time_test(loops=520000, reps=3)
output
Disassembly and_chk 13 0 LOAD_FAST 0 (a) 3 JUMP_IF_FALSE 4 (to 10) 6 POP_TOP 7 LOAD_FAST 1 (b) >> 10 UNARY_NOT 11 RETURN_VALUE and_chk_if 16 0 LOAD_FAST 0 (a) 3 JUMP_IF_FALSE 4 (to 10) 6 POP_TOP 7 LOAD_FAST 1 (b) >> 10 JUMP_IF_TRUE 5 (to 18) 13 POP_TOP 17 14 LOAD_GLOBAL 0 (True) 17 RETURN_VALUE >> 18 POP_TOP 19 19 LOAD_GLOBAL 1 (False) 22 RETURN_VALUE 23 LOAD_CONST 0 (None) 26 RETURN_VALUE not_or_chk 22 0 LOAD_FAST 0 (a) 3 UNARY_NOT 4 JUMP_IF_TRUE 5 (to 12) 7 POP_TOP 8 LOAD_FAST 1 (b) 11 UNARY_NOT >> 12 RETURN_VALUE not_or_chk_if 25 0 LOAD_FAST 0 (a) 3 UNARY_NOT 4 JUMP_IF_TRUE 8 (to 15) 7 POP_TOP 8 LOAD_FAST 1 (b) 11 UNARY_NOT 12 JUMP_IF_FALSE 5 (to 20) >> 15 POP_TOP 26 16 LOAD_GLOBAL 0 (True) 19 RETURN_VALUE >> 20 POP_TOP 28 21 LOAD_GLOBAL 1 (False) 24 RETURN_VALUE 25 LOAD_CONST 0 (None) 28 RETURN_VALUE Verifying... and_chk (0, 0) True (0, 1) True (1, 0) True (1, 1) False and_chk_if (0, 0) True (0, 1) True (1, 0) True (1, 1) False not_or_chk (0, 0) True (0, 1) True (1, 0) True (1, 1) False not_or_chk_if (0, 0) True (0, 1) True (1, 0) True (1, 1) False Timing tests Loops = 520000, Repetitions = 3 and_chk (0, 0) [0.36773014068603516, 0.37793493270874023, 0.38387489318847656] (0, 1) [0.36292791366577148, 0.37119913101196289, 0.37146902084350586] (1, 0) [0.38673520088195801, 0.39340090751647949, 0.39670205116271973] (1, 1) [0.38938498497009277, 0.39505791664123535, 0.40034103393554688] and_chk_if (0, 0) [0.4021449089050293, 0.40345501899719238, 0.41558098793029785] (0, 1) [0.40686416625976562, 0.41213202476501465, 0.44800615310668945] (1, 0) [0.4281308650970459, 0.42916202545166016, 0.43681907653808594] (1, 1) [0.46246123313903809, 0.46759700775146484, 0.47544980049133301] not_or_chk (0, 0) [0.35435700416564941, 0.36368083953857422, 0.36867713928222656] (0, 1) [0.35602092742919922, 0.35732793807983398, 0.36237406730651855] (1, 0) [0.39550518989562988, 0.40660715103149414, 0.40977287292480469] (1, 1) [0.4060060977935791, 0.4112389087677002, 0.41296815872192383] not_or_chk_if (0, 0) [0.4308779239654541, 0.44109201431274414, 0.45827698707580566] (0, 1) [0.43600606918334961, 0.4370419979095459, 0.47623395919799805] (1, 0) [0.48452401161193848, 0.48769593238830566, 0.49147915840148926] (1, 1) [0.53107500076293945, 0.54048299789428711, 0.55434417724609375]
These timings were performed using Python 2.6.6 on a 2GHz Pentium 4 (single-core 32 bit) running Mepis 11 (a Debian family Linux distro).
You'll note that I've avoided using your next(tups)
strategy to get the arguments for each function call and instead I'm passing the args directly, as constants. The time taken to perform next(tups)
should be fairly constant, but it's best to eliminate such overheads, when practical, so that the reported time measurements more accurately reflect the performance of the code we're really interested in.
Also, it's usual to perform several repetitions of the timing loop and take the minimum value; FWIW, I generally do 3 to 5 reps. From the timeit docs:
Note
It’s tempting to calculate mean and standard deviation from the result vector and report these. However, this is not very useful. In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet; higher values in the result vector are typically not caused by variability in Python’s speed, but by other processes interfering with your timing accuracy. So the min() of the result is probably the only number you should be interested in. After that, you should look at the entire vector and apply common sense rather than statistics.
Your Meta post says that you want to perform & report on other micro-optimization experiments, so you might be interested in taking a look at some code I posted a few months ago that does time tests on various implementations of the factorial function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With