Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is "if not (a and b)" faster than "if not a or not b"?

Tags:

On a whim, I recently tested these two methods with timeit, to see which evaluation method was faster:

import timeit  """Test method returns True if either argument is falsey, else False."""  def and_chk((a, b)):     if not (a and b):         return True     return False  def not_or_chk((a, b)):     if not a or not b:         return True     return False 

...and got these results:

 VALUES FOR a,b ->      0,0         0,1         1,0         1,1         method     and_chk(a,b)    0.95559     0.98646     0.95138     0.98788  not_or_chk(a,b)    0.96804     1.07323     0.96015     1.05874                                             ...seconds per 1,111,111 cycles. 

The difference in efficiency is between one and nine percent, always in favour of if not (a and b), which is the opposite of what I might expect since I understand that if not a or not b will evaluate its terms (if not a and then if not b) in order, running the if block once it encounters a true expression (and there are no and clauses). In contrast, the and_chk method needs to evaluate both clauses before it can return any result to the if not.. that wraps it.

The timing results, however, disprove this understanding. How, then, is the if condition being evaluated? I am perfectly aware of the fact that this degree of microoptimization is practically, if not completely, pointless. I just want to understand how Python is going about it.


For completion's sake, this is how I set up timeit...

cyc = 1111111  bothFalse_and = iter([(0,0)] * cyc) zeroTrue_and = iter([(1,0)] * cyc) oneTrue_and = iter([(0,1)] * cyc) bothTrue_and = iter([(1,1)] * cyc)  bothFalse_notor = iter([(0,0)] * cyc) zeroTrue_notor = iter([(1,0)] * cyc) oneTrue_notor = iter([(0,1)] * cyc) bothTrue_notor = iter([(1,1)] * cyc)  time_bothFalse_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import bothFalse_and as tups, and_chk') time_zeroTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import zeroTrue_and as tups, and_chk') time_oneTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import oneTrue_and as tups, and_chk') time_bothTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import bothTrue_and as tups, and_chk')  time_bothFalse_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import bothFalse_notor as tups, not_or_chk') time_zeroTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import zeroTrue_notor as tups, not_or_chk') time_oneTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import oneTrue_notor as tups, not_or_chk') time_bothTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import bothTrue_notor as tups, not_or_chk') 

...then ran each timeit.Timer(..) function with .timeit(cyc) to get the results posted.

like image 778
Augusta Avatar asked Apr 10 '15 00:04

Augusta


2 Answers

TL;DR

The not_or_chk function requires two unary operations in addition to two jumps (in the worst case), while the and_chk function only has the two jumps (again, in the worst case).

Details

The dis module to the rescue! The dis module lets you take a look at the Python bytecode disassembly of your code. For example:

import dis  """Test method returns True if either argument is falsey, else False."""  def and_chk((a, b)):     if not (a and b):         return True     return False  def not_or_chk((a, b)):     if not a or not b:         return True     return False  print("And Check:\n") print(dis.dis(and_chk))  print("Or Check:\n") print(dis.dis(not_or_chk)) 

Produces this output:

And Check:    5           0 LOAD_FAST                0 (.0)               3 UNPACK_SEQUENCE          2               6 STORE_FAST               1 (a)               9 STORE_FAST               2 (b)    6          12 LOAD_FAST                1 (a)    * This block is the *              15 JUMP_IF_FALSE_OR_POP    21        * disassembly of    *              18 LOAD_FAST                2 (b)    * the "and_chk"     *         >>   21 POP_JUMP_IF_TRUE        28        * function          *    7          24 LOAD_GLOBAL              0 (True)              27 RETURN_VALUE    8     >>   28 LOAD_GLOBAL              1 (False)              31 RETURN_VALUE None Or Check:   10           0 LOAD_FAST                0 (.0)               3 UNPACK_SEQUENCE          2               6 STORE_FAST               1 (a)               9 STORE_FAST               2 (b)   11          12 LOAD_FAST                1 (a)    * This block is the *              15 UNARY_NOT                         * disassembly of    *              16 POP_JUMP_IF_TRUE        26        * the "not_or_chk"  *              19 LOAD_FAST                2 (b)    * function          *              22 UNARY_NOT              23 POP_JUMP_IF_FALSE       30   12     >>   26 LOAD_GLOBAL              0 (True)              29 RETURN_VALUE   13     >>   30 LOAD_GLOBAL              1 (False)              33 RETURN_VALUE None 

Take a look at the two blocks of Python bytecode that I've marked with the asterisks. Those blocks are your two disassembled functions. Note that and_chk only has two jumps, and the calculations in the function are made while deciding whether or not to take the jump.

On the other hand, the not_or_chkfunction requires the not operation to be carried out twice in the worst case, in addition to the interpreter deciding whether or not to take the jump.

like image 125
skrrgwasme Avatar answered Sep 25 '22 07:09

skrrgwasme


I just noticed this question via your Meta SO question: Is it appropriate to share the results of my research toward solving my own minor questions?. As you mention in that question (and one of the tags on this question indicates), this sort of thing falls into the category of micro-optimization. Ideally, one shouldn't need to worry about such minor performance differences, and as Knuth says, premature optimization is the root of all evil. But I guess it is fun and instructive to investigate such matters, as it can give you a better feel for how Python works "under the hood".

Mephy's comment prompted me to see what the timing differences were for if-less versions of your functions. The results are interesting, IMHO. I also took the opportunity to streamline your testing procedure.

#!/usr/bin/env python  ''' Do timeit tests on various implementations of NAND      NAND returns True if either argument is falsey, else False.     From https://stackoverflow.com/q/29551438/4014959     Written by PM 2Ring 2015.04.09 '''  from timeit import Timer import dis  def and_chk(a, b):     return not (a and b)  def and_chk_if(a, b):     if not (a and b):         return True     else:         return False  def not_or_chk(a, b):     return not a or not b  def not_or_chk_if(a, b):     if not a or not b:         return True     else:         return False   #All the functions funcs = (     and_chk,     and_chk_if,     not_or_chk,     not_or_chk_if, )  #Argument tuples to test the functions with bools = (0, 1) bool_tups = [(u, v) for u in bools for v in bools]   def show_dis():     ''' Show the disassembly for each function '''     print 'Disassembly'     for func in funcs:         fname = func.func_name         print '\n%s' % fname         dis.dis(func)     print   def verify():     ''' Verify that the functions actually perform as intended '''     print 'Verifying...'     for func in funcs:         fname = func.func_name         print '\n%s' % fname         for args in bool_tups:             print args, func(*args)     print   def time_test(loops, reps):     ''' Print timing stats for all the functions '''     print 'Timing tests\nLoops = %d, Repetitions = %d' % (loops, reps)      for func in funcs:         fname = func.func_name         print '\n%s' % fname         setup = 'from __main__ import %s' % fname         for args in bool_tups:             t = Timer('%s%s' % (fname, args), setup)             r = t.repeat(reps, loops)             r.sort()             print args, r   show_dis() verify() time_test(loops=520000, reps=3) 

output

Disassembly  and_chk  13           0 LOAD_FAST                0 (a)               3 JUMP_IF_FALSE            4 (to 10)               6 POP_TOP                            7 LOAD_FAST                1 (b)         >>   10 UNARY_NOT                         11 RETURN_VALUE          and_chk_if  16           0 LOAD_FAST                0 (a)               3 JUMP_IF_FALSE            4 (to 10)               6 POP_TOP                            7 LOAD_FAST                1 (b)         >>   10 JUMP_IF_TRUE             5 (to 18)              13 POP_TOP                17          14 LOAD_GLOBAL              0 (True)              17 RETURN_VALUE                 >>   18 POP_TOP                19          19 LOAD_GLOBAL              1 (False)              22 RETURN_VALUE                      23 LOAD_CONST               0 (None)              26 RETURN_VALUE          not_or_chk  22           0 LOAD_FAST                0 (a)               3 UNARY_NOT                          4 JUMP_IF_TRUE             5 (to 12)               7 POP_TOP                            8 LOAD_FAST                1 (b)              11 UNARY_NOT                    >>   12 RETURN_VALUE          not_or_chk_if  25           0 LOAD_FAST                0 (a)               3 UNARY_NOT                          4 JUMP_IF_TRUE             8 (to 15)               7 POP_TOP                            8 LOAD_FAST                1 (b)              11 UNARY_NOT                         12 JUMP_IF_FALSE            5 (to 20)         >>   15 POP_TOP                26          16 LOAD_GLOBAL              0 (True)              19 RETURN_VALUE                 >>   20 POP_TOP                28          21 LOAD_GLOBAL              1 (False)              24 RETURN_VALUE                      25 LOAD_CONST               0 (None)              28 RETURN_VALUE          Verifying...  and_chk (0, 0) True (0, 1) True (1, 0) True (1, 1) False  and_chk_if (0, 0) True (0, 1) True (1, 0) True (1, 1) False  not_or_chk (0, 0) True (0, 1) True (1, 0) True (1, 1) False  not_or_chk_if (0, 0) True (0, 1) True (1, 0) True (1, 1) False  Timing tests Loops = 520000, Repetitions = 3  and_chk (0, 0) [0.36773014068603516, 0.37793493270874023, 0.38387489318847656] (0, 1) [0.36292791366577148, 0.37119913101196289, 0.37146902084350586] (1, 0) [0.38673520088195801, 0.39340090751647949, 0.39670205116271973] (1, 1) [0.38938498497009277, 0.39505791664123535, 0.40034103393554688]  and_chk_if (0, 0) [0.4021449089050293, 0.40345501899719238, 0.41558098793029785] (0, 1) [0.40686416625976562, 0.41213202476501465, 0.44800615310668945] (1, 0) [0.4281308650970459, 0.42916202545166016, 0.43681907653808594] (1, 1) [0.46246123313903809, 0.46759700775146484, 0.47544980049133301]  not_or_chk (0, 0) [0.35435700416564941, 0.36368083953857422, 0.36867713928222656] (0, 1) [0.35602092742919922, 0.35732793807983398, 0.36237406730651855] (1, 0) [0.39550518989562988, 0.40660715103149414, 0.40977287292480469] (1, 1) [0.4060060977935791, 0.4112389087677002, 0.41296815872192383]  not_or_chk_if (0, 0) [0.4308779239654541, 0.44109201431274414, 0.45827698707580566] (0, 1) [0.43600606918334961, 0.4370419979095459, 0.47623395919799805] (1, 0) [0.48452401161193848, 0.48769593238830566, 0.49147915840148926] (1, 1) [0.53107500076293945, 0.54048299789428711, 0.55434417724609375] 

These timings were performed using Python 2.6.6 on a 2GHz Pentium 4 (single-core 32 bit) running Mepis 11 (a Debian family Linux distro).

You'll note that I've avoided using your next(tups) strategy to get the arguments for each function call and instead I'm passing the args directly, as constants. The time taken to perform next(tups) should be fairly constant, but it's best to eliminate such overheads, when practical, so that the reported time measurements more accurately reflect the performance of the code we're really interested in.

Also, it's usual to perform several repetitions of the timing loop and take the minimum value; FWIW, I generally do 3 to 5 reps. From the timeit docs:

Note

It’s tempting to calculate mean and standard deviation from the result vector and report these. However, this is not very useful. In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet; higher values in the result vector are typically not caused by variability in Python’s speed, but by other processes interfering with your timing accuracy. So the min() of the result is probably the only number you should be interested in. After that, you should look at the entire vector and apply common sense rather than statistics.

Your Meta post says that you want to perform & report on other micro-optimization experiments, so you might be interested in taking a look at some code I posted a few months ago that does time tests on various implementations of the factorial function.

like image 24
PM 2Ring Avatar answered Sep 23 '22 07:09

PM 2Ring