Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split by \b when your regex engine doesn't support it

Tags:

python

regex

How can I split by word boundary in a regex engine that doesn't support it?

python's re can match on \b but doesn't seem to support splitting on it. I seem to recall dealing with other regex engines that had the same limitation.

example input:

"hello, foo"

expected output:

['hello', ', ', 'foo']

actual python output:

>>> re.compile(r'\b').split('hello, foo')
['hello, foo']
like image 394
ʞɔıu Avatar asked Dec 14 '22 05:12

ʞɔıu


1 Answers

(\W+) can give you the expected output:

>>> re.compile(r'(\W+)').split('hello, foo')
['hello', ', ', 'foo']
like image 128
Christian C. Salvadó Avatar answered Dec 29 '22 09:12

Christian C. Salvadó