If I have two lists, each 800 elements long and filled with integers. Is there a faster way to compare that they have the exact same elements (and short circuit if they don't) than using the built in ==
operator?
a = [6,2,3,88,54,-486]
b = [6,2,3,88,54,-486]
a == b
>>> True
Anything better than this?
I'm curious only because I have a giant list of lists to compare.
Python sort() method and == operator to compare lists We can club the Python sort() method with the == operator to compare two lists. Python sort() method is used to sort the input lists with a purpose that if the two input lists are equal, then the elements would reside at the same index positions.
sort() coupled with == operator can achieve this task. We first sort the list, so that if both the lists are identical, then they have elements at the same position.
Use == operator to check if two lists are exactly equal It is the easiest and quickest way to check if both the lists are exactly equal.
Let's not assume, but run some tests!
The set-up:
>>> import time
>>> def timeit(l1, l2, n):
start = time.time()
for i in xrange(n):
l1 == l2
end = time.time()
print "%d took %.2fs" % (n, end - start)
Two giant equal lists:
>>> hugeequal1 = [10]*30000
>>> hugeequal2 = [10]*30000
>>> timeit(hugeequal1, hugeequal2, 10000)
10000 took 3.07s
Two giant lists where the first element is not equal:
>>> easydiff1 = [10]*30000
>>> easydiff2 = [10]*30000
>>> easydiff2[0] = 0
>>> timeit(easydiff1, easydiff2, 10000)
10000 took 0.00s
>>> timeit(easydiff1, easydiff2, 1000000)
1000000 took 0.14s
So it appears the built-in list equality operator does indeed do the short-circuiting.
EDIT: Interestingly, using the array.array
module doesn't make it any faster:
>>> import array
>>> timeit(hugeequal1, hugeequal2, 1000)
1000 took 0.30s
>>> timeit(array.array('l', hugeequal1), array.array('l', hugeequal2), 1000)
1000 took 1.11s
numpy
does get you a good speed-up, though:
>>> import numpy
>>> timeit(hugeequal1, hugeequal2, 10000)
10000 took 3.01s
>>> timeit(numpy.array(hugeequal1), numpy.array(hugeequal2), 10000)
10000 took 1.11s
Numpy can speed this up 10x, and is particularly relevant since your lists are of a fix (integer) type.
In pure python, each comparison has to follow a reference to the next elements, check the types, etc. In numpy, only a pointer needs to be incremented.
Here's a comparison:
import numpy as np
from timeit import timeit
N = 10**7
p0 = list(range(N))
p1 = list(range(N))
n0 = np.arange(N)
n1 = np.arange(N)
number = 500
t = timeit("p0==p1", setup="from __main__ import p0, p1", number=number)
print "pure python time =", t/number
number = 500
t = timeit("(n0==n1).all()", setup="from __main__ import n0, n1", number=number)
print "numpy time =", t/number
And the result is 10x faster using numpy:
pure python time = 0.256077399254
numpy time = 0.0286148643494
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With