I have a piece of code which splits a string after commas and dots (but not when a digit is before or after a comma or dot):
text = "This is, a sample text. Some more text. $1,200 test."
print re.split('(?<!\d)[,.]|[,.](?!\d)', text)
The result is:
['This is', ' a sample text', ' Some more text', ' $1,200 test', '']
I don't want to lose the commas and dots. So what I am looking for is:
['This is,', 'a sample text.', 'Some more text.', '$1,200 test.']
Besides, if a dot in the end of text
it produces an empty string in the end of the list. Furthermore, there are white-spaces at the beginning of the split strings. Is there a better method without using re
? How would you do this?
Unfortunately you can't use re.split()
on a zero-length match, so unless you can guarantee that there will be whitespace after the comma or dot you will need to use a different approach.
Here is one option that uses re.findall()
:
>>> text = "This is, a sample text. Some more text. $1,200 test."
>>> print re.findall(r'(?:\d[,.]|[^,.])*(?:[,.]|$)', text)
['This is,', ' a sample text.', ' Some more text.', ' $1,200 test.', '']
This doesn't strip whitespace and you will get an empty match at the end if the string ends with a comma or dot, but those are pretty easy fixes.
If it is a safe assumption that there will be whitespace after every comma and dot you want to split on, then we can just split the string on that whitespace which makes it a little simpler:
>>> print re.split(r'(?<=[,.])(?<!\d.)\s', text)
['This is,', 'a sample text.', 'Some more text.', '$1,200 test.']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With