Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python: split string after comma and dots

Tags:

python

regex

I have a piece of code which splits a string after commas and dots (but not when a digit is before or after a comma or dot):

text = "This is, a sample text. Some more text. $1,200 test."
print re.split('(?<!\d)[,.]|[,.](?!\d)', text)

The result is:

['This is', ' a sample text', ' Some more text', ' $1,200 test', '']

I don't want to lose the commas and dots. So what I am looking for is:

['This is,', 'a sample text.', 'Some more text.', '$1,200 test.']

Besides, if a dot in the end of text it produces an empty string in the end of the list. Furthermore, there are white-spaces at the beginning of the split strings. Is there a better method without using re? How would you do this?

like image 812
Johnny Avatar asked Jan 12 '23 13:01

Johnny


1 Answers

Unfortunately you can't use re.split() on a zero-length match, so unless you can guarantee that there will be whitespace after the comma or dot you will need to use a different approach.

Here is one option that uses re.findall():

>>> text = "This is, a sample text. Some more text. $1,200 test."
>>> print re.findall(r'(?:\d[,.]|[^,.])*(?:[,.]|$)', text)
['This is,', ' a sample text.', ' Some more text.', ' $1,200 test.', '']

This doesn't strip whitespace and you will get an empty match at the end if the string ends with a comma or dot, but those are pretty easy fixes.

If it is a safe assumption that there will be whitespace after every comma and dot you want to split on, then we can just split the string on that whitespace which makes it a little simpler:

>>> print re.split(r'(?<=[,.])(?<!\d.)\s', text)
['This is,', 'a sample text.', 'Some more text.', '$1,200 test.']
like image 154
Andrew Clark Avatar answered Jan 21 '23 15:01

Andrew Clark