python: split string after comma and dots

Question

I have a piece of code which splits a string after commas and dots (but not when a digit is before or after a comma or dot):

text = "This is, a sample text. Some more text. $1,200 test."
print re.split('(?<!\d)[,.]|[,.](?!\d)', text)

The result is:

['This is', ' a sample text', ' Some more text', ' $1,200 test', '']

I don't want to lose the commas and dots. So what I am looking for is:

['This is,', 'a sample text.', 'Some more text.', '$1,200 test.']

Besides, if a dot in the end of text it produces an empty string in the end of the list. Furthermore, there are white-spaces at the beginning of the split strings. Is there a better method without using re? How would you do this?

Andrew Clark · Accepted Answer

Unfortunately you can't use re.split() on a zero-length match, so unless you can guarantee that there will be whitespace after the comma or dot you will need to use a different approach.

Here is one option that uses re.findall():

>>> text = "This is, a sample text. Some more text. $1,200 test."
>>> print re.findall(r'(?:\d[,.]|[^,.])*(?:[,.]|$)', text)
['This is,', ' a sample text.', ' Some more text.', ' $1,200 test.', '']

This doesn't strip whitespace and you will get an empty match at the end if the string ends with a comma or dot, but those are pretty easy fixes.

If it is a safe assumption that there will be whitespace after every comma and dot you want to split on, then we can just split the string on that whitespace which makes it a little simpler:

>>> print re.split(r'(?<=[,.])(?<!\d.)\s', text)
['This is,', 'a sample text.', 'Some more text.', '$1,200 test.']

python: split string after comma and dots

Tags:

python

regex

Johnny

1 Answers

Andrew Clark

Recent Activity

Donate For Us

python: split string after comma and dots

Tags:

python

regex

Johnny

1 Answers

Andrew Clark

Related questions

Recent Activity

Donate For Us