I have a list of words,
list1 = ['hello', 'how', 'are', 'you?', 'i', 'am', 'fine', 'thanks.', 'great!']
Which I would like to join to be,
list2 = ['hello how are you?', 'i am fine thanks.', 'great!']
Is there an simple pythonic way to do this? I have considered doing an itertools.groupby join but the problem is that all of the elements of my group don't have the same criteria (I can't just query if they all have punctuation). Basically, whether or not element x gets included is a function of potentially element x+n where n can be large. This complicates the problem.
In Python3, string.punctuation is a pre-initialized string used as string constant. In Python, string.punctuation will give the all sets of punctuation. Parameters : Doesn’t take any parameter, since it’s not a function. Returns : Return all sets of punctuation. Note : Make sure to import string library function inorder to use string.punctuation
There is another, more powerful, way to join strings together. You can go from a list to a string in Python with the join () method. The common use case here is when you have an iterable—like a list—made up of strings, and you want to combine those strings into a single string. Like.split (),.join () is a string instance method.
*This comes out as the "else " conditional statement in Python. Else conditional statement is a simple response to the "if " conditional statement such as if this does not happen what else would? Therefore, whenever the "if " statement returns False and conditional block from "if " skips, the conditional block from "else " executes.
In Python there is a strong distinction between statements and expressions. break is a statement, and it can therefore not be used in the conditional expression which works on expressions. I'll also note that the else is mandatory.
Don't use groupby()
; you'd get separate groups for those words with and without punctuation, which you then have to re-combine.
Use a generator function:
import string
def sentence_groups(l, punctuation=tuple(string.punctuation)):
group = []
for w in l:
group.append(w)
if w.endswith(punctuation):
yield group
group = []
if group:
yield group
The generator collects words from the input list until one ends with punctuation, at which point that whole group is yielded, after which the group is cleared for a new group.
When iteration ends and there are still words in the group, that last group is yielded too (even though they don't have punctuation at the end).
Use this together whith str.join()
to produce your output:
>>> list1 = ['hello', 'how', 'are', 'you?', 'i', 'am', 'fine', 'thanks.', 'great!']
>>> [' '.join(group) for group in sentence_groups(list1)]
['hello how are you?', 'i am fine thanks.', 'great!']
I used all punctuation in the string.punctuation
string; this is quite broad:
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
Pass in a tuple of specific punctuation characters as the second argument if you wanted to narrow that down, or hardcode your own definition.
A humble solution:
import string
words = ['hello', 'how', 'are', 'you?', 'i', 'am', 'fine', 'thanks.', 'great!']
sents = []
range_flag = 0
for index, word in enumerate(words):
if word[-1] in string.punctuation:
sents.append(words[range_flag+1:index+1])
print(range_flag, index)
range_flag = index
print([" ".join(s) for s in sents])
(0, 3)
(3, 7)
(7, 8)
['how are you?', 'i am fine thanks.', 'great!']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With