Concatenating selected strings in list of strings

Tags:

The problem is as follows. I have a list of strings

lst1=['puffing','his','first','cigarette','in', 'weeks', 'in', 'weeks']

and I would like to obtain the string

lst2=['puffing','his','first','cigarette','in weeks', 'in weeks']

that is to concatenate any occurence of the sublist ['in', 'weeks'] for reasons that are irrelevant here, where find_sub_list1 is taken from here (and included in the code below):

npis = [['in', 'weeks'], ['in', 'ages']]

# given a list a candidate sublist, return the index of the first and last
# element of the sublist within the list
def find_sub_list1(sl,l):
    results=[]
    sll=len(sl)
    for ind in (i for i,e in enumerate(l) if e==sl[0]):
        if l[ind:ind+sll]==sl:
        results.append((ind,ind+sll-1))

    return results

def concatenator(sent, npis):
    indices = []
    for npi in npis:
        indices_temp = find_sub_list1(npi, sent)
        if indices_temp != []:
            indices.extend(indices_temp)
    sorted(indices, key=lambda x: x[0])

    for (a,b) in indices:
        diff = b - a
        sent[a:b+1] = [" ".join(sent[a:b+1])]
        del indices[0]
        indices = [(a - diff, b - diff) for (a,b) in indices]

    return sent

instead of the desired lst2 this coder returns:

concatenator(lst1,['in', 'weeks'])
>>['puffing','his','first','cigarette','in weeks', 'in', 'weeks']

so it only concatenates the first occurrence. Any ideas about where the code is failing?

874

asked May 02 '17 03:05

Orest Xherija

1 Answers

since the desired sub-sequence is 'in' 'weeks' and possibly 'in''ages'

One possible solution could be (the looping is not very elegant though):

first find all positions where 'in' occurs.
then iterate through the source list, appending elements to the target list, and treating the positions of 'in' specially, i.e. if the following word is in a special set then join the two & append to the target, advancing the iterator one extra time.
Once the source list is exhausted an IndexError will be thrown, indicating that we should break the loop.

code:

index_in = [i for i, _ in enumerate(lst1) if _ == 'in']

lst2 = []; n = 0

while True:
    try:
         if n in index_in and lst1[n+1] in ['weeks', 'ages']:
             lst2.append(lst1[n] + lst1[n+1])
             n += 1
         else:
             lst2.append(lst1[n])
         n += 1
     except IndexError:
         break

A better way to do this would be through regular expressions.

join the list to a string with space as a separator
split the list on spaces, except those spaces surrounded by in<space>weeks. Here, we can use negative lookahead & lookbehind

code:

import re

c = re.compile(r'(?<!in) (?!weeks)')

lst2 = c.split(' '.join(lst1))

124

answered Sep 30 '22 09:09

Haleemur Ali

Related questions
                            
                                Are there best practices for extensible magic methods in python?
                            
                                Mock a connection class in pytest
                            
                                Pandas select rows where query is in column of tuples
                            
                                How in Django/Python can I ensure safety from WYSIWYG-entered HTML?
                            
                                Naive install of PySpark to also support S3 access
                            
                                Is definition order available in a module namespace?
                            
                                Python flask ajax get image - last EDIT is the issue
                            
                                Accessing RNN weights- Tensorflow
                            
                                Why is using tanh definition of logistic sigmoid faster than scipy's expit?
                            
                                Broadcast a user defined class in Spark
                            
                                subprocess not running the command generated though the command works on terminal
                            
                                Running Python startup code after modules are loaded
                            
                                Variables with dynamic shape TensorFlow
                            
                                Uniform Cost Search in Python
                            
                                Modify Held-Karp TSP algorithm so we do not need to go back to the origin
                            
                                apply a function on rolling window in Dataframe where whole dataframe is passed to function
                            
                                Scrapy python csv output has blank lines between each row
                            
                                Adaptive Histogram Equalization in Python
                            
                                Numpy: how delete rows common to 2 matrices
                            
                                Add word embedding to word2vec gensim model

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Concatenating selected strings in list of strings

Tags:

python

string

list

python-3.x

tuples

Orest Xherija

People also ask

1 Answers

Haleemur Ali

Recent Activity

Donate For Us