I am looking for the fastest way to output the index of the first difference of two arrays in Python. For example, let's take the following two arrays: <pre class="prettyprint"><code>test1 = [1, 3, 5, 8] test2 = [1] test3 = [1, 3] </code></pre> Comparing <code>test1</code> and <code>test2</code>, I would like to output <code>1</code>, while the comparison of test1 and test3 should output <code>2</code>. In other words I look for an equivalent to the statement: <pre class="prettyprint"><code>import numpy as np np.where(np.where(test1 == test2, test1, 0) == '0')[0][0] </code></pre> with varying array lengths. Any help is appreciated.

with numpy arrays (which will be faster for big arrays) then you could check the lengths of the lists then (also) check the overlapping parts something like the following (obviously slicing the longer to the length of the shorter): <pre class="prettyprint"><code>import numpy as np n = min(len(test1), len(test2)) x = np.where(test1[:n] != test2[:n])[0] if len(x) > 0: ans = x[0] elif len(test1) != len(test2): ans = n else: ans = None </code></pre> EDIT - despite this being voted down I will leave my answer up here in case someone else needs to do something similar. If the starting arrays are large and numpy then this is the fastest method. Also I had to modify Andy's code to get it to work. In the order: 1. my suggestion, 2. Paidric's (now removed but the most elegant), 3. Andy's accepted answer, 4. zip - non numpy, 5. vanilla python without zip as per @leekaiinthesky 0.1ms, 9.6ms, 0.6ms, 2.8ms, 2.3ms if the conversion to ndarray is included in timeit then the non-numpy nop-zip method is fastest 7.1ms, 17.1ms, 7.7ms, 2.8ms, 2.3ms and even more so if the difference between the two lists is at around index 1,000 rather than 10,000 7.1ms, 17.1ms, 7.7ms, 0.3ms, 0.2ms <pre class="prettyprint"><code>import timeit setup = """ import numpy as np from itertools import zip_longest list1 = [1 for i in range(10000)] + [4, 5, 7] list2 = [1 for i in range(10000)] + [4, 4] test1 = np.array(list1) test2 = np.array(list2) def find_first_diff(l1, l2): for index, (x, y) in enumerate(zip_longest(l1, l2, fillvalue=object())): if x != y: return index def findFirstDifference(list1, list2): minLength = min(len(list1), len(list2)) for index in range(minLength): if list1[index] != list2[index]: return index return minLength """ fn = [""" n = min(len(test1), len(test2)) x = np.where(test1[:n] != test2[:n])[0] if len(x) > 0: ans = x[0] elif len(test1) != len(test2): ans = n else: ans = None""", """ x = np.where(np.in1d(list1, list2) == False)[0] if len(x) > 0: ans = x[0] else: ans = None""", """ x = test1 y = np.resize(test2, x.shape) x = np.where(np.where(x == y, x, 0) == 0)[0] if len(x) > 0: ans = x[0] else: ans = None""", """ ans = find_first_diff(list1, list2)""", """ ans = findFirstDifference(list1, list2)"""] for f in fn: print(timeit.timeit(f, setup, number = 1000)) </code></pre>

Here one way to do it: <pre class="prettyprint"><code>from itertools import izip def compare_lists(lista, listb): """ Compare two lists and return the first index where they differ. if they are equal, return the list len """ for position, (a, b) in enumerate(zip(lista, listb)): if a != b: return position return min([len(lista), len(listb)]) </code></pre> <ul> <li>The algorithm is simple: <code>zip</code> (or in this case, a more efficient <code>izip</code>) the two lists, then compare them element by element.</li> <li>The <code>eumerate</code> function gives the index position which we can return if a discrepancy found</li> <li> If we exit the <code>for</code> loop without any returns, one of the two possibilities can happen: <ol> <li>The two lists are identical. In this case, we want to return the length of either lists.</li> <li>Lists are of different length and they are equal up to the length of the shorter list. In this case, we want to return the length of the shorter list</li> </ol> In ether cases, the <code>min(...)</code> expression is what we want. </li> <li>This function has a bug: if you compare two empty lists, it returns 0, which seems wrong. I'll leave it to you to fix it as an exercise.</li> </ul>

Python: Fastest Way to compare arrays elementwise

Tags:

python

arrays

numpy

I am looking for the fastest way to output the index of the first difference of two arrays in Python. For example, let's take the following two arrays:

test1 = [1, 3, 5, 8]
test2 = [1]
test3 = [1, 3]

Comparing test1 and test2, I would like to output 1, while the comparison of test1 and test3 should output 2.

In other words I look for an equivalent to the statement:

import numpy as np
np.where(np.where(test1 == test2, test1, 0) == '0')[0][0]

with varying array lengths.

Any help is appreciated.

228

asked May 10 '15 17:05

Andy

3 Answers

For lists this works:

from itertools import zip_longest

def find_first_diff(list1, list2):
    for index, (x, y) in enumerate(zip_longest(list1, list2, 
                                               fillvalue=object())):
        if x != y:
            return index

zip_longest pads the shorter list with None or with a provided fill value. The standard zip does not work if the difference is caused by different list lengths rather than actual different values in the lists.

On Python 2 use izip_longest.

Updated: Created unique fill value to avoid potential problems with None as list value. object() is unique:

>>> o1 = object()
>>> o2 = object()
>>> o1 == o2
False

This pure Python approach might be faster than a NumPy solution. This depends on the actual data and other circumstances.

Converting a list into a NumPy array also takes time. This might actually take longer than finding the index with the function above. If you are not going to use the NumPy array for other calculations, the conversion might cause considerable overhead.
NumPy always searches the full array. If the difference comes early, you do a lot more work than you need to.
NumPy creates a bunch of intermediate arrays. This costs memory and time.
NumPy needs to construct intermediate arrays with the maximum length. Comparing many small with very large arrays is unfavorable here.

In general, in many cases NumPy is faster than a pure Python solution. But each case is a bit different and there are situations where pure Python is faster.

126

answered Sep 29 '22 16:09

Mike Müller

with numpy arrays (which will be faster for big arrays) then you could check the lengths of the lists then (also) check the overlapping parts something like the following (obviously slicing the longer to the length of the shorter):

import numpy as np

n = min(len(test1), len(test2))
x = np.where(test1[:n] != test2[:n])[0]
if len(x) > 0:
  ans = x[0]
elif len(test1) != len(test2):
  ans = n
else:
  ans = None

EDIT - despite this being voted down I will leave my answer up here in case someone else needs to do something similar.

If the starting arrays are large and numpy then this is the fastest method. Also I had to modify Andy's code to get it to work. In the order: 1. my suggestion, 2. Paidric's (now removed but the most elegant), 3. Andy's accepted answer, 4. zip - non numpy, 5. vanilla python without zip as per @leekaiinthesky

0.1ms, 9.6ms, 0.6ms, 2.8ms, 2.3ms

if the conversion to ndarray is included in timeit then the non-numpy nop-zip method is fastest

7.1ms, 17.1ms, 7.7ms, 2.8ms, 2.3ms

and even more so if the difference between the two lists is at around index 1,000 rather than 10,000

7.1ms, 17.1ms, 7.7ms, 0.3ms, 0.2ms

import timeit

setup = """
import numpy as np
from itertools import zip_longest
list1 = [1 for i in range(10000)] + [4, 5, 7]
list2 = [1 for i in range(10000)] + [4, 4]
test1 = np.array(list1)
test2 = np.array(list2)

def find_first_diff(l1, l2):
    for index, (x, y) in enumerate(zip_longest(l1, l2, fillvalue=object())):
        if x != y:
            return index

def findFirstDifference(list1, list2):
  minLength = min(len(list1), len(list2))
  for index in range(minLength):
    if list1[index] != list2[index]:
      return index
  return minLength
"""

fn = ["""
n = min(len(test1), len(test2))
x = np.where(test1[:n] != test2[:n])[0]
if len(x) > 0:
  ans = x[0]
elif len(test1) != len(test2):
  ans = n
else:
  ans = None""",
"""
x = np.where(np.in1d(list1, list2) == False)[0]
if len(x) > 0:
  ans = x[0]
else:
  ans = None""",
"""
x = test1
y = np.resize(test2, x.shape)
x = np.where(np.where(x == y, x, 0) == 0)[0]
if len(x) > 0:
  ans = x[0]
else:
  ans = None""",
"""
ans = find_first_diff(list1, list2)""",
"""
ans = findFirstDifference(list1, list2)"""]

for f in fn:
  print(timeit.timeit(f, setup, number = 1000))

answered Sep 29 '22 14:09

paddyg

Here one way to do it:

from itertools import izip
def compare_lists(lista, listb):
    """
    Compare two lists and return the first index where they differ. if
    they are equal, return the list len
    """
    for position, (a, b) in enumerate(zip(lista, listb)):
        if a != b:
            return position
    return min([len(lista), len(listb)])

The algorithm is simple: zip (or in this case, a more efficient izip) the two lists, then compare them element by element.
The eumerate function gives the index position which we can return if a discrepancy found
If we exit the for loop without any returns, one of the two possibilities can happen:
1. The two lists are identical. In this case, we want to return the length of either lists.
2. Lists are of different length and they are equal up to the length of the shorter list. In this case, we want to return the length of the shorter list
In ether cases, the min(...) expression is what we want.
This function has a bug: if you compare two empty lists, it returns 0, which seems wrong. I'll leave it to you to fix it as an exercise.

answered Sep 29 '22 15:09

Hai Vu

Related questions
                            
                                Turning a Pandas Dataframe to an array and evaluate Multiple Linear Regression Model
                            
                                Can't silence warnings that django-cms produces
                            
                                Python VLC binding- playing a playlist
                            
                                Tkinter -- how to horizontally center canvas text?
                            
                                How to convert a dict of lists to a list of tuples of key and value in python?
                            
                                Python Import Module from Decorator
                            
                                Vagrant, Flask — App not running on 10.10.10.10, 127.0.0.1
                            
                                arcpy get database path of feature class in feature dataset
                            
                                Cannot install ggplot with anaconda
                            
                                python pandas TimeStamps to local time string with daylight saving
                            
                                Matplotlib Pyplot logo/image in Plot
                            
                                Is there a way to start android emulator in Travis CI build?
                            
                                Python Beautiful Soup 'ascii' codec can't encode character u'\xa5'
                            
                                find words of length 4 using regular expression
                            
                                More pythonic alternative for getting a value in range not using min and max [closed]
                            
                                python RE findall() return value is an entire string
                            
                                subprocess.check_output(): OSError file not found in Python
                            
                                (Numpy) Index list to boolean array
                            
                                How to convert list of lists to a set in python so I can compare to other sets?
                            
                                Is there a Python 'shortcut' to define a class variable equal to a string version of its own name?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With