The problem is as follows. I have a list of strings
lst1=['puffing','his','first','cigarette','in', 'weeks', 'in', 'weeks']
and I would like to obtain the string
lst2=['puffing','his','first','cigarette','in weeks', 'in weeks']
that is to concatenate any occurence of the sublist ['in', 'weeks']
for reasons that are irrelevant here, where find_sub_list1
is taken from here (and included in the code below):
npis = [['in', 'weeks'], ['in', 'ages']]
# given a list a candidate sublist, return the index of the first and last
# element of the sublist within the list
def find_sub_list1(sl,l):
results=[]
sll=len(sl)
for ind in (i for i,e in enumerate(l) if e==sl[0]):
if l[ind:ind+sll]==sl:
results.append((ind,ind+sll-1))
return results
def concatenator(sent, npis):
indices = []
for npi in npis:
indices_temp = find_sub_list1(npi, sent)
if indices_temp != []:
indices.extend(indices_temp)
sorted(indices, key=lambda x: x[0])
for (a,b) in indices:
diff = b - a
sent[a:b+1] = [" ".join(sent[a:b+1])]
del indices[0]
indices = [(a - diff, b - diff) for (a,b) in indices]
return sent
instead of the desired lst2
this coder returns:
concatenator(lst1,['in', 'weeks'])
>>['puffing','his','first','cigarette','in weeks', 'in', 'weeks']
so it only concatenates the first occurrence. Any ideas about where the code is failing?
You can concatenate a list of strings into a single string with the string method, join() . Call the join() method from 'String to insert' and pass [List of strings] . If you use an empty string '' , [List of strings] is simply concatenated, and if you use a comma , , it makes a comma-delimited string.
Concatenation is the process of appending one string to the end of another string. You concatenate strings by using the + operator. For string literals and string constants, concatenation occurs at compile time; no run-time concatenation occurs. For string variables, concatenation occurs only at run time.
The most conventional method to perform the list concatenation, the use of “+” operator can easily add the whole of one list behind the other list and hence perform the concatenation.
since the desired sub-sequence is 'in' 'weeks'
and possibly 'in''ages'
One possible solution could be (the looping is not very elegant though):
first find all positions where 'in'
occurs.
then iterate through the source list, appending elements to the target list, and treating the positions of 'in'
specially, i.e. if the following word is in a special set then join the two & append to the target, advancing the iterator one extra time.
Once the source list is exhausted an IndexError will be thrown, indicating that we should break the loop.
code:
index_in = [i for i, _ in enumerate(lst1) if _ == 'in']
lst2 = []; n = 0
while True:
try:
if n in index_in and lst1[n+1] in ['weeks', 'ages']:
lst2.append(lst1[n] + lst1[n+1])
n += 1
else:
lst2.append(lst1[n])
n += 1
except IndexError:
break
A better way to do this would be through regular expressions.
join the list to a string with space as a separator
split the list on spaces, except those spaces surrounded by in<space>weeks
. Here, we can use negative lookahead & lookbehind
code:
import re
c = re.compile(r'(?<!in) (?!weeks)')
lst2 = c.split(' '.join(lst1))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With