In my code I used to have comparisons like if a == b or a == c or a == d:
fairly frequently. At some point I discovered that these could easily be shortened to if a in {b, c, d}:
or if a in (b, c, d):
if the values aren't hashable. However, I have never seen such a construction in anyone else's code. This is probably because either:
==
way is slower.==
way is more pythonic.in
way except me.Which reason, if any, is it?
For simple values (i.e. not expressions or NaN
s), if a == b or a == c
and if a in <iterable of b and c>
are equivalent.
If the values are hashable, it's better to use in
with a set literal instead of tuple or list literals:
if a in {b, c}: ...
CPython's peephole optimiser is often able to replace that with a cached frozenset()
object, and membership tests against sets are O(1) operations.
Performancewise : "in" is better
timeit.timeit("pub='1'; pub == 1 or pub == '1'")
0.07568907737731934
timeit.timeit("pub='1'; pub in[1, '1']")
0.04272890090942383
timeit.timeit("pub=1; pub == 1 or pub == '1'")
0.07502007484436035
timeit.timeit("pub=1; pub in[1, '1']")
0.07035684585571289
Also "in" ensures code is not repeating a == 1 or a == 2 is repetition. And bad to read. "in" just makes it much more easy to understand. This is one of the cases which is simple yet elegant code practice. In short we(should) use "in" more often if we are not already using it.
I was curious to know what the timing difference was between straight comparison vs checking in the array.
Conclusion: The cost of constructing the array is not free and must be taken into account when considering the speed differences.
If the array is being constructed at the time of comparison, it is technically slower than the simple comparison. So the simple comparison would be faster in or out of a loop.
That said if the array is already constructed then it would be faster to check in the array in a large loop than doing a simple comparison.
$ speed.py
inarray x 1000000: 0.277590343844
comparison x 1000000: 0.347808290754
makearray x 1000000: 0.408771123295
import timeit
NUM = 1000000
a = 1
b = 2
c = 3
d = 1
array = {b,c,d}
tup = (b,c,d)
lst = [b,c,d]
def comparison():
if a == b or a == c or a == d:
pass
def makearray():
if a in {b, c, d}:
pass
def inarray():
if a in array:
pass
def maketuple():
if a in (b,c,d):
pass
def intuple():
if a in tup:
pass
def makelist():
if a in [b,c,d]:
pass
def inlist():
if a in lst:
pass
def time_all(funcs, params=None):
timers = []
for func in funcs:
if params:
tx = timeit.Timer(lambda: func(*params))
else:
tx = timeit.Timer(lambda: func())
timers.append([func, tx.timeit(NUM)])
for func, speed in sorted(timers, key=lambda x: x[1]):
print "{fn:<25} x {n}: ".format(fn=func.func_name, n=NUM), speed
print ""
return
time_all([comparison,
makearray,
inarray,
intuple,
maketuple,
inlist,
makelist
],
)
This doesn't quite answer your question as to the reason why you don't often see the comparison using in. I would be speculating but it's likely a mix of 1,2,4, and the situation where the author needed to write that particular bit of code.
I've personally used both methods depending on the situation. The choice usually came down to speed or simplicity.
edit:
@bracco23 is right, there are slight differences whereby using tuples vs array vs list will change the timing.
$ speed.py
inarray x 1000000: 0.260784980761
intuple x 1000000: 0.288696420718
inlist x 1000000: 0.311479982167
maketuple x 1000000: 0.356532747578
comparison x 1000000: 0.360010093964
makearray x 1000000: 0.41094386108
makelist x 1000000: 0.433603059099
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With