import re
s = 'PythonCookbookListOfContents'
# the first line does not work
print re.split('(?<=[a-z])(?=[A-Z])', s )
# second line works well
print re.sub('(?<=[a-z])(?=[A-Z])', ' ', s)
# it should be ['Python', 'Cookbook', 'List', 'Of', 'Contents']
How to split a string from the border of a lower case character and an upper case character using Python re?
Why does the first line fail to work while the second line works well?
According to re.split
:
Note that split will never split a string on an empty pattern match. For example:
>>> re.split('x*', 'foo') ['foo'] >>> re.split("(?m)^$", "foo\n\nbar\n") ['foo\n\nbar\n']
How about using re.findall
instead? (Instead of focusing on separators, focus on the item you want to get.)
>>> import re
>>> s = 'PythonCookbookListOfContents'
>>> re.findall('[A-Z][a-z]+', s)
['Python', 'Cookbook', 'List', 'Of', 'Contents']
UPDATE
Using regex
module (Alternative regular expression module, to replace re), you can split on zero-width match:
>>> import regex
>>> s = 'PythonCookbookListOfContents'
>>> regex.split('(?<=[a-z])(?=[A-Z])', s, flags=regex.VERSION1)
['Python', 'Cookbook', 'List', 'Of', 'Contents']
NOTE: Specify regex.VERSION1
flag to enable split-on-zero-length-match behavior.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With