How to efficiently find the indices of matching elements in two lists

1 Answers

Without duplicates

If your objects are hashable and your lists have no duplicates, you can create an inverted index of the first list and then traverse the second list. This traverses each list only once and thus is O(n).

def find_matching_index(list1, list2):

    inverse_index = { element: index for index, element in enumerate(list1) }

    return [(index, inverse_index[element])
        for index, element in enumerate(list2) if element in inverse_index]

find_matching_index([1,2,3], [3,2,1]) # [(0, 2), (1, 1), (2, 0)]

With duplicates

You can extend the previous solution to account for duplicates. You can keep track of multiple indices with a set.

def find_matching_index(list1, list2):

    # Create an inverse index which keys are now sets
    inverse_index = {}

    for index, element in enumerate(list1):

        if element not in inverse_index:
            inverse_index[element] = {index}

        else:
            inverse_index[element].add(index)

    # Traverse the second list    
    matching_index = []

    for index, element in enumerate(list2):

        # We have to create one pair by element in the set of the inverse index
        if element in inverse_index:
            matching_index.extend([(x, index) for x in inverse_index[element]])

    return matching_index

find_matching_index([1, 1, 2], [2, 2, 1]) # [(2, 0), (2, 1), (0, 2), (1, 2)]

Unfortunately, this is no longer O(n). Consider the case where you input [1, 1] and [1, 1], the output is [(0, 0), (0, 1), (1, 0), (1, 1)]. Thus by the size of the output, the worst case cannot be better than O(n^2).

Although, this solution is still O(n) if there are no duplicates.

Non-hashable objects

Now comes the case where your objects are not hashable, but comparable. The idea here will be to sort your lists in a way that preserves the origin index of each element. Then we can group sequences of elements that are equal to get matching indices.

Since we make heavy use of groupby and product in the following code, I made find_matching_index return a generator for memory efficiency on long lists.

from itertools import groupby, product

def find_matching_index(list1, list2):
    sorted_list1 = sorted((element, index) for index, element in enumerate(list1))
    sorted_list2 = sorted((element, index) for index, element in enumerate(list2))

    list1_groups = groupby(sorted_list1, key=lambda pair: pair[0])
    list2_groups = groupby(sorted_list2, key=lambda pair: pair[0])

    for element1, group1 in list1_groups:
        try:
            element2, group2 = next(list2_groups)
            while element1 > element2:
                (element2, _), group2 = next(list2_groups)

        except StopIteration:
            break

        if element2 > element1:
            continue

        indices_product = product((i for _, i in group1), (i for _, i in group2), repeat=1)

        yield from indices_product

        # In version prior to 3.3, the above line must be
        # for x in indices_product:
        #     yield x

list1 = [[], [1, 2], []]
list2 = [[1, 2], []]

list(find_matching_index(list1, list2)) # [(0, 1), (2, 1), (1, 0)]

It turns out that time complexity does not suffer that much. Sorting of course takes O(n log(n)), but then groupby provides generators that can recover all elements by traversing our lists only twice. The conclusion is that our complexity is primarly bound by the size of the output of product. Thus giving a best case where the algorithm is O(n log(n)) and a worst case that is once again O(n^2).

106

answered Oct 05 '22 02:10

Olivier Melançon

Related questions
                            
                                How can I keep test data after Django tests complete?
                            
                                Memory efficient sort of massive numpy array in Python
                            
                                What is the difference between skew and kurtosis functions in pandas vs. scipy?
                            
                                ValueError: setting an array element with a sequence. for Pandas
                            
                                Reorder levels of MultiIndex in a pandas DataFrame
                            
                                How to replace all values in a Pandas Dataframe not in a list? [duplicate]
                            
                                Using Boto3 to interact with amazon Aurora on RDS
                            
                                Average of a numpy array returns NaN
                            
                                overcome Graphdef cannot be larger than 2GB in tensorflow
                            
                                interpolate missing values 2d python
                            
                                How to remove the extra row (or column) after transpose() in Pandas
                            
                                Google Search Web Scraping with Python
                            
                                How can I slice each element of a numpy array of strings?
                            
                                Difference between '[:]' and '[::]' slicing when copying a list?
                            
                                No module named urllib3
                            
                                Python subprocess.call not waiting for process to finish blender
                            
                                pandas groupby where you get the max of one column and the min of another column
                            
                                Python error when calling NumPy from class method with map
                            
                                Tox WARNING:test command found but not installed in testenv
                            
                                Not able to upload local files in google colab

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to efficiently find the indices of matching elements in two lists

Tags:

python

algorithm

matching

Haoran

People also ask