Python: finding a common sublist of a given length present in two lists

Question

I have to find an efficient python code to do the following:

Find, at least one if existing, sequence of n consecutive elements which is included in the two given lists.

For example, with n=3, the result for these two lists will be ['Tom', 'Sam', 'Jill']:

lst1 = ['John', 'Jim', 'Tom', 'Sam', 'Jill', 'Chris']
lst2 = ['Chris', 'John', 'Tom', 'Sam', 'Jill', 'Jim']

The sample code below does the trick, but it will take forever to do the same if I have to compare hundreds of thousands of rows/lists. Any advice on how to optimize the performance of this code for processing a very large amount of data will be greatly appreciated!

lst1 = ['John', 'Jim', 'Tom', 'Sam', 'Jill', 'Chris']
lst2 = ['Chris', 'John', 'Tom', 'Sam', 'Jill', 'Jim']
strNum = 3 #represents number of consecutive strings to search for
common_element_found = 'False'
common_elements = []

lst1length = len(lst1) - (strNum - 1)
lst2length = len(lst2) - (strNum - 1)
for x in range(lst1length):
    ConsecutiveStringX = lst1[x] + ' ' + lst1[x + 1] + ' ' + lst1[x + 2]
    for y in range(lst2length):
        ConsecutiveStringY = lst2[y] + ' ' + lst2[y + 1] + ' ' + lst2[y + 2]
        if ConsecutiveStringY == ConsecutiveStringX:
            common_element_found = 'True'
            common_elements.append(ConsecutiveStringY)
            print('Match found: ' + str(common_elements)) 
            break
    if common_element_found == 'True':
        common_element_found = 'False'
        break

rafaelc · Accepted Answer

IIUC,

consecs1 = [ tuple(lst1[i:i+3]) for i in range(0, len(lst1)-2)]
consecs2 = { tuple(lst2[i:i+3]) for i in range(0, len(lst2)-2)}

for c in consecs1: 
    if c in consecs2:
         print(c)

Output

('Tom', 'Sam', 'Jill')

Explanation: you can make a list of tuples for lst1, which are hashable objects, and check if they are in the set of tuples built from lst2 (which grants O(1) speed).

PS: Even though sets are unorded, order is guaranteed because the loop follows lst1 and not lst2 ordering.

Python: finding a common sublist of a given length present in two lists

Tags:

python

algorithm

string-search

dsv2kx

1 Answers

rafaelc

Recent Activity

Donate For Us

Python: finding a common sublist of a given length present in two lists

Tags:

python

algorithm

string-search

dsv2kx

1 Answers

rafaelc

Related questions

Recent Activity

Donate For Us