I have two lists.
L1 = ['worry not', 'be happy', 'very good', 'not worry', 'good very', 'full stop'] # bigrams list
L2 = ['take into account', 'always be happy', 'stay safe friend', 'happy be always'] #trigrams list
If I look closely, L1 has 'not worry' and 'good very' which are exact reversed repetitions of 'worry not' and 'very good'.
I need to remove such reversed elements from the list. Similary in L2, 'happy be always' is a reverse of 'always be happy', which is to be removed as well.
The final output I'm looking for is:
L1 = ['worry not', 'be happy', 'very good', 'full stop']
L2 = ['take into account', 'always be happy', 'stay safe friend']
I tried one solution
[[max(zip(map(set, map(str.split, group)), group))[1]] for group in L1]
But it is not giving the correct output. Should I be writing different functions for bigrams and trigrams reverse repetition removal, or is there a pythonic way of doing this in a faster way,because I'll have to run this for about 10K+strings.
You can do it with list comprehensions if you iterate over the list from the end
lst = L1[::-1] # L2[::-1]
x = [s for i, s in enumerate(lst) if ' '.join(s.split()[::-1]) not in lst[i+1:]][::-1]
# L1: ['worry not', 'be happy', 'very good', 'full stop']
# L2: ['take into account', 'always be happy', 'stay safe friend']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With