Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

if a == b or a == c: vs if a in {b, c}:

Tags:

python

In my code I used to have comparisons like if a == b or a == c or a == d: fairly frequently. At some point I discovered that these could easily be shortened to if a in {b, c, d}: or if a in (b, c, d): if the values aren't hashable. However, I have never seen such a construction in anyone else's code. This is probably because either:

  1. The == way is slower.
  2. The == way is more pythonic.
  3. They actually do subtly different things.
  4. I have, by chance, not looked at any code which required either.
  5. I have seen it and just ignored or forgotten it.
  6. One shouldn't need to have comparisons like this because one's code sould be better elsewhere.
  7. Nobody has thought of the in way except me.

Which reason, if any, is it?

like image 427
LeopardShark Avatar asked Aug 21 '17 15:08

LeopardShark


3 Answers

For simple values (i.e. not expressions or NaNs), if a == b or a == c and if a in <iterable of b and c> are equivalent.

If the values are hashable, it's better to use in with a set literal instead of tuple or list literals:

if a in {b, c}: ...

CPython's peephole optimiser is often able to replace that with a cached frozenset() object, and membership tests against sets are O(1) operations.

like image 153
Eugene Yarmash Avatar answered Oct 17 '22 13:10

Eugene Yarmash


Performancewise : "in" is better

timeit.timeit("pub='1'; pub == 1 or pub == '1'")
0.07568907737731934
timeit.timeit("pub='1'; pub in[1, '1']")
0.04272890090942383
timeit.timeit("pub=1; pub == 1 or pub == '1'")
0.07502007484436035
timeit.timeit("pub=1; pub in[1, '1']")
0.07035684585571289

Also "in" ensures code is not repeating a == 1 or a == 2 is repetition. And bad to read. "in" just makes it much more easy to understand. This is one of the cases which is simple yet elegant code practice. In short we(should) use "in" more often if we are not already using it.

like image 39
Surjit R Avatar answered Oct 17 '22 14:10

Surjit R


I was curious to know what the timing difference was between straight comparison vs checking in the array.

Conclusion: The cost of constructing the array is not free and must be taken into account when considering the speed differences.

If the array is being constructed at the time of comparison, it is technically slower than the simple comparison. So the simple comparison would be faster in or out of a loop.

That said if the array is already constructed then it would be faster to check in the array in a large loop than doing a simple comparison.

$ speed.py
inarray                   x 1000000:  0.277590343844
comparison                x 1000000:  0.347808290754
makearray                 x 1000000:  0.408771123295
import timeit

NUM = 1000000

a = 1
b = 2
c = 3
d = 1

array = {b,c,d}
tup = (b,c,d)
lst = [b,c,d]

def comparison():
    if a == b or a == c or a == d:
        pass

def makearray():
    if a in {b, c, d}:
        pass

def inarray():
    if a in array:
        pass

def maketuple():
    if a in (b,c,d):
        pass

def intuple():
    if a in tup:
        pass

def makelist():
    if a in [b,c,d]:
        pass

def inlist():
    if a in lst:
        pass


def time_all(funcs, params=None):
    timers = []
    for func in funcs:
        if params:
            tx = timeit.Timer(lambda: func(*params))
        else:
            tx = timeit.Timer(lambda: func())
        timers.append([func, tx.timeit(NUM)])

    for func, speed in sorted(timers, key=lambda x: x[1]):
        print "{fn:<25} x {n}: ".format(fn=func.func_name, n=NUM), speed
    print ""
    return

time_all([comparison,
          makearray,
          inarray,
          intuple,
          maketuple,
          inlist,
          makelist
          ], 
         )

This doesn't quite answer your question as to the reason why you don't often see the comparison using in. I would be speculating but it's likely a mix of 1,2,4, and the situation where the author needed to write that particular bit of code.

I've personally used both methods depending on the situation. The choice usually came down to speed or simplicity.


edit:

@bracco23 is right, there are slight differences whereby using tuples vs array vs list will change the timing.

$ speed.py
inarray                   x 1000000:  0.260784980761
intuple                   x 1000000:  0.288696420718
inlist                    x 1000000:  0.311479982167
maketuple                 x 1000000:  0.356532747578
comparison                x 1000000:  0.360010093964
makearray                 x 1000000:  0.41094386108
makelist                  x 1000000:  0.433603059099
like image 41
Marcel Wilson Avatar answered Oct 17 '22 13:10

Marcel Wilson