Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a faster way to test if two lists have the exact same elements than Pythons built in == operator?

Tags:

python

If I have two lists, each 800 elements long and filled with integers. Is there a faster way to compare that they have the exact same elements (and short circuit if they don't) than using the built in == operator?

a = [6,2,3,88,54,-486]
b = [6,2,3,88,54,-486]
a == b 
>>> True

Anything better than this?

I'm curious only because I have a giant list of lists to compare.

like image 603
Zack Yoshyaro Avatar asked Aug 23 '13 19:08

Zack Yoshyaro


People also ask

How do you check if two lists have the same elements Python?

Python sort() method and == operator to compare lists We can club the Python sort() method with the == operator to compare two lists. Python sort() method is used to sort the input lists with a purpose that if the two input lists are equal, then the elements would reside at the same index positions.

How do you check if a list is the same as another list Python?

sort() coupled with == operator can achieve this task. We first sort the list, so that if both the lists are identical, then they have elements at the same position.

How do you check if two lists are the same?

Use == operator to check if two lists are exactly equal It is the easiest and quickest way to check if both the lists are exactly equal.


2 Answers

Let's not assume, but run some tests!

The set-up:

>>> import time
>>> def timeit(l1, l2, n):
        start = time.time()
        for i in xrange(n):
                l1 == l2
        end = time.time()
        print "%d took %.2fs" % (n, end - start)

Two giant equal lists:

>>> hugeequal1 = [10]*30000
>>> hugeequal2 = [10]*30000
>>> timeit(hugeequal1, hugeequal2, 10000)
10000 took 3.07s

Two giant lists where the first element is not equal:

>>> easydiff1 = [10]*30000
>>> easydiff2 = [10]*30000
>>> easydiff2[0] = 0
>>> timeit(easydiff1, easydiff2, 10000)
10000 took 0.00s
>>> timeit(easydiff1, easydiff2, 1000000)
1000000 took 0.14s

So it appears the built-in list equality operator does indeed do the short-circuiting.

EDIT: Interestingly, using the array.array module doesn't make it any faster:

>>> import array
>>> timeit(hugeequal1, hugeequal2, 1000)
1000 took 0.30s
>>> timeit(array.array('l', hugeequal1), array.array('l', hugeequal2), 1000)
1000 took 1.11s

numpy does get you a good speed-up, though:

>>> import numpy
>>> timeit(hugeequal1, hugeequal2, 10000)
10000 took 3.01s
>>> timeit(numpy.array(hugeequal1), numpy.array(hugeequal2), 10000)
10000 took 1.11s
like image 83
Claudiu Avatar answered Oct 30 '22 04:10

Claudiu


Numpy can speed this up 10x, and is particularly relevant since your lists are of a fix (integer) type.

In pure python, each comparison has to follow a reference to the next elements, check the types, etc. In numpy, only a pointer needs to be incremented.

Here's a comparison:

import numpy as np
from timeit import timeit

N = 10**7

p0 = list(range(N))
p1 = list(range(N))

n0 = np.arange(N)
n1 = np.arange(N)

number = 500
t = timeit("p0==p1", setup="from __main__ import p0, p1", number=number)
print "pure python time =", t/number

number = 500
t = timeit("(n0==n1).all()", setup="from __main__ import n0, n1", number=number)
print "numpy time =", t/number

And the result is 10x faster using numpy:

pure python time = 0.256077399254
numpy time = 0.0286148643494
like image 22
tom10 Avatar answered Oct 30 '22 04:10

tom10