Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python conditional joining of *consecutive* strings that don't end in punctuation with those that do

I have a list of words,

list1 = ['hello', 'how', 'are', 'you?', 'i', 'am', 'fine', 'thanks.', 'great!']

Which I would like to join to be,

list2 = ['hello how are you?', 'i am fine thanks.', 'great!']

Is there an simple pythonic way to do this? I have considered doing an itertools.groupby join but the problem is that all of the elements of my group don't have the same criteria (I can't just query if they all have punctuation). Basically, whether or not element x gets included is a function of potentially element x+n where n can be large. This complicates the problem.

like image 747
sfortney Avatar asked Feb 15 '18 19:02

sfortney


People also ask

How to get all punctuation in a string in Python?

In Python3, string.punctuation is a pre-initialized string used as string constant. In Python, string.punctuation will give the all sets of punctuation. Parameters : Doesn’t take any parameter, since it’s not a function. Returns : Return all sets of punctuation. Note : Make sure to import string library function inorder to use string.punctuation

How to join strings together in Python?

There is another, more powerful, way to join strings together. You can go from a list to a string in Python with the join () method. The common use case here is when you have an iterable—like a list—made up of strings, and you want to combine those strings into a single string. Like.split (),.join () is a string instance method.

What is the ELSE conditional statement in Python?

*This comes out as the "else " conditional statement in Python. Else conditional statement is a simple response to the "if " conditional statement such as if this does not happen what else would? Therefore, whenever the "if " statement returns False and conditional block from "if " skips, the conditional block from "else " executes.

Can you use break in a conditional expression in Python?

In Python there is a strong distinction between statements and expressions. break is a statement, and it can therefore not be used in the conditional expression which works on expressions. I'll also note that the else is mandatory.


2 Answers

Don't use groupby(); you'd get separate groups for those words with and without punctuation, which you then have to re-combine.

Use a generator function:

import string

def sentence_groups(l, punctuation=tuple(string.punctuation)):
    group = []
    for w in l:
        group.append(w)
        if w.endswith(punctuation):
            yield group
            group = []
    if group:
        yield group

The generator collects words from the input list until one ends with punctuation, at which point that whole group is yielded, after which the group is cleared for a new group.

When iteration ends and there are still words in the group, that last group is yielded too (even though they don't have punctuation at the end).

Use this together whith str.join() to produce your output:

>>> list1 = ['hello', 'how', 'are', 'you?', 'i', 'am', 'fine', 'thanks.', 'great!']
>>> [' '.join(group) for group in sentence_groups(list1)]
['hello how are you?', 'i am fine thanks.', 'great!']

I used all punctuation in the string.punctuation string; this is quite broad:

>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

Pass in a tuple of specific punctuation characters as the second argument if you wanted to narrow that down, or hardcode your own definition.

like image 123
Martijn Pieters Avatar answered Sep 29 '22 22:09

Martijn Pieters


A humble solution:

import string

words = ['hello', 'how', 'are', 'you?', 'i', 'am', 'fine', 'thanks.', 'great!']
sents = []

range_flag = 0
for index, word in enumerate(words):
  if word[-1] in string.punctuation:
    sents.append(words[range_flag+1:index+1])
    print(range_flag, index)
    range_flag = index

print([" ".join(s) for s in sents])

(0, 3)
(3, 7)
(7, 8)
['how are you?', 'i am fine thanks.', 'great!']
like image 45
jrjames83 Avatar answered Sep 29 '22 20:09

jrjames83