Python regular expression, matching the last word

Question

I've the following problem. I'm looking to find all words in a string that typically looks like so HelloWorldToYou Notice, each word is capitalized as a start followed by the next word and so on. I'm looking to create a list of words from it. So the final expected output is a list that looks like

['Hello','World','To','You']

In Python, I used the following

mystr = 'HelloWorldToYou'
pat = re.compile(r'([A-Z](.*?))(?=[A-Z]+)')
[x[0] for x in pat.findall(mystr)]
['Hello', 'World', 'To']

However, I'm unable to capture the last word 'You'. Is there a way to get at this? Thanks in advance

Wiktor Stribiżew · Accepted Answer

Use the alternation with $:

import re
mystr = 'HelloWorldToYou'
pat = re.compile(r'([A-Z][a-z]*)')
# or your version with `.*?`: pat = re.compile(r'([A-Z].*?)(?=[A-Z]+|$)')
print pat.findall(mystr)

See IDEONE demo

Output:

['Hello', 'World', 'To', 'You']

Regex explanation:

([A-Z][a-z]*) - A capturing group that matches
- [A-Z] a capital English letter followed by
- [a-z]* - optional number of lowercase English letters
  -OR-
- .*? - Match any characters other than a newline lazily

The lookahead can be omitted if we use [a-z]*, but if you use .*?, then use it:

(?=[A-Z]+|$) - Up to an uppercase English letter (we can actually remove + here), OR the end of string ($).

If you do not use a look-ahead version, you can even remove the capturing group for better performance and use finditer:

import re
mystr = 'HelloWorldToYou'
pat = re.compile(r'[A-Z][a-z]*')
print [x.group() for x in pat.finditer(mystr)]

Python regular expression, matching the last word

Tags:

python

regex

list

broccoli

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us

Python regular expression, matching the last word

Tags:

python

regex

list

broccoli

1 Answers

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us