Possible Duplicate:
Python split() without removing the delimiter
I wish to split a string as follows:
text = " T?e qu!ck ' brown 1 fox! jumps-.ver. the 'lazy' doG? !"
result -> (" T?e qu!ck ' brown 1 fox!", "jumps-.ver.", "the 'lazy' doG?", "!")
So basically I want to split at ". ", "! " or "? " but I want the spaces at the split points to be removed but not the dot, comma or question-mark.
How can I do this in an efficient way?
The str split function takes only on separator. I wonder is the best solution to split on all spaces and then find those that end with dot, comma or question-mark when constructing the required result.
You can achieve this using a regular expression split:
>>> import re
>>> text = " T?e qu!ck ' brown 1 fox! jumps-.ver. the 'lazy' doG? !"
>>> re.split('(?<=[.!?]) +',text)
[" T?e qu!ck ' brown 1 fox!", 'jumps-.ver.', "the 'lazy' doG?", '!']
The regular expression '(?<=[.!?]) +' means match a sequence of one or more spaces (' +') only if preceded by a ., ! or ? character ('(?<=[.!?])').
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With