I have to find an efficient python code to do the following:
Find, at least one if existing, sequence of n consecutive elements which is included in the two given lists.
For example, with n=3, the result for these two lists will be ['Tom', 'Sam', 'Jill']:
lst1 = ['John', 'Jim', 'Tom', 'Sam', 'Jill', 'Chris']
lst2 = ['Chris', 'John', 'Tom', 'Sam', 'Jill', 'Jim']
The sample code below does the trick, but it will take forever to do the same if I have to compare hundreds of thousands of rows/lists. Any advice on how to optimize the performance of this code for processing a very large amount of data will be greatly appreciated!
lst1 = ['John', 'Jim', 'Tom', 'Sam', 'Jill', 'Chris']
lst2 = ['Chris', 'John', 'Tom', 'Sam', 'Jill', 'Jim']
strNum = 3 #represents number of consecutive strings to search for
common_element_found = 'False'
common_elements = []
lst1length = len(lst1) - (strNum - 1)
lst2length = len(lst2) - (strNum - 1)
for x in range(lst1length):
ConsecutiveStringX = lst1[x] + ' ' + lst1[x + 1] + ' ' + lst1[x + 2]
for y in range(lst2length):
ConsecutiveStringY = lst2[y] + ' ' + lst2[y + 1] + ' ' + lst2[y + 2]
if ConsecutiveStringY == ConsecutiveStringX:
common_element_found = 'True'
common_elements.append(ConsecutiveStringY)
print('Match found: ' + str(common_elements))
break
if common_element_found == 'True':
common_element_found = 'False'
break
IIUC,
consecs1 = [ tuple(lst1[i:i+3]) for i in range(0, len(lst1)-2)]
consecs2 = { tuple(lst2[i:i+3]) for i in range(0, len(lst2)-2)}
for c in consecs1:
if c in consecs2:
print(c)
Output
('Tom', 'Sam', 'Jill')
Explanation: you can make a list of tuples for lst1, which are hashable objects, and check if they are in the set of tuples built from lst2 (which grants O(1) speed).
PS: Even though sets are unorded, order is guaranteed because the loop follows lst1 and not lst2 ordering.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With