I have a (very large) list similar to:
a = ['A', 'B', 'A', 'B', 'A', 'C', 'D', 'E', 'D', 'E', 'D', 'F', 'G', 'A', 'B']
and I want to extract from it a list of lists like:
result = [['A', 'B', 'A', 'B', 'A'], ['D', 'E', 'D', 'E', 'D']]
The repeating patterns can be different, for example there can also be intervals such as:
['A', 'B', 'C', 'A', 'D', 'E', 'A'] (with a 'jump' over two elements)
I have written a very simple code that seems to work:
tolerance = 2
counter = 0
start, stop = 0, 0
for idx in range(len(a) - 1):
if a[idx] == a[idx+1] and counter == 0:
start = idx
counter += 1
elif a[idx] == a[idx+1] and counter != 0:
if tolerance <= 0:
stop = idx
tolerance = 2
elif a[idx] != a[idx+1]:
tolerance -= 1
if start != 0 and stop != 0:
result = [a[start::stop]]
But 1) it is very cumbersome, 2) I need to apply this to very large lists, so is there a more concise and faster way of implementing it?
EDIT: As @Kasramvd correctly pointed out, I need the largest set that satisfies the requirement of (at most a tolerance number of jumps between the start and end elements), so I take:
['A', 'B', 'A', 'B', 'A'] instead of [ 'B', 'A', 'B' ]
because the former includes the latter.
Also it would be good if the code can select elements UP TO the certain tolerance, for example if the tolerance (maximum number of elements not equal to the start or end element) is 2, it should also return sets as:
['A', 'A', 'A', 'B', 'A', 'B', 'A', 'C', 'D', 'A']
with tolerances
0, 1 and 2.
Given a list of lists, write a Python program to extract first element of each sublist in the given list of lists. This method uses zip with * or unpacking operator which passes all the items inside the ‘lst’ as arguments to zip function. Thus, all the first element will become the first tuple of the zipped list.
Python Server Side Programming Programming A list in python can also contain lists inside it as elements. These nested lists are called sublists. In this article we will solve the challenge of retrieving only the first element of each sublist in a given list.
We get a list of the first items in each sublist. You can also use list comprehension to reduce to above code to a single line. We get the same result as above. You can similarly use the above methods to get the last element in each sublist. Use the -1 index to access the last element from a list.
To select elements from a Python list, we will use list.append (). We will create a list of indices to be accessed and the loop is used to iterate through this index list to access the specified element.
Solution without any extra copying of lists other than the sublist results:
def sublists(a, tolerance):
result = []
index = 0
while index < len(a):
curr = a[index]
for i in range(index, len(a)):
if a[i] == curr:
end = i
elif i - end > tolerance:
break
if index != end:
result.append(a[index:end+1])
index += end - index + 1
return result
Usage is simply as follows:
a = ['A', 'B', 'A', 'B', 'A', 'C', 'D', 'E', 'D', 'E', 'D', 'F', 'G', 'A', 'B']
sublists(a, 0) # []
sublists(a, 1) # [['A', 'B', 'A', 'B', 'A'], ['D', 'E', 'D', 'E', 'D']]
sublists(a, 2) # [['A', 'B', 'A', 'B', 'A'], ['D', 'E', 'D', 'E', 'D']]
Possible solution to extra requirement as specified in the comments:
if i > index and a[i] == a[i-1] == curr:
end = i - 1
break
elif a[i] == curr:
end = i
elif i - end > tolerance:
break
Note: I've not tested this thoroughly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With