Python re can't split zero-width anchors? [duplicate]

Question

import re

s = 'PythonCookbookListOfContents'

# the first line does not work
print re.split('(?<=[a-z])(?=[A-Z])', s ) 

# second line works well
print re.sub('(?<=[a-z])(?=[A-Z])', ' ', s)

# it should be ['Python', 'Cookbook', 'List', 'Of', 'Contents']

How to split a string from the border of a lower case character and an upper case character using Python re?

Why does the first line fail to work while the second line works well?

falsetru · Accepted Answer

According to re.split:

Note that split will never split a string on an empty pattern match. For example:
>>> re.split('x*', 'foo')
['foo']
>>> re.split("(?m)^$", "foo

bar
")
['foo

bar
']

How about using re.findall instead? (Instead of focusing on separators, focus on the item you want to get.)

>>> import re
>>> s = 'PythonCookbookListOfContents'
>>> re.findall('[A-Z][a-z]+', s)
['Python', 'Cookbook', 'List', 'Of', 'Contents']

UPDATE

Using regex module (Alternative regular expression module, to replace re), you can split on zero-width match:

>>> import regex
>>> s = 'PythonCookbookListOfContents'
>>> regex.split('(?<=[a-z])(?=[A-Z])', s, flags=regex.VERSION1)
['Python', 'Cookbook', 'List', 'Of', 'Contents']

NOTE: Specify regex.VERSION1 flag to enable split-on-zero-length-match behavior.

Python re can't split zero-width anchors? [duplicate]

Tags:

python

regex

Booster

1 Answers

falsetru

Recent Activity

Donate For Us

Python re can't split zero-width anchors? [duplicate]

Tags:

python

regex

Booster

1 Answers

falsetru

Related questions

Recent Activity

Donate For Us