Split by suffix with Python regular expression

Question

I want to split strings only by suffixes. For example, I would like to be able to split dord word to [dor,wor].

I though that \wd would search for words that end with d. However this does not produce the expected results

import re
re.split(r'\wd',"dord word")
['do', ' wo', '']

How can I split by suffixes?

vks · Accepted Answer

x='dord word'
import re
print re.split(r"d\b",x)

or

print [i for i in re.split(r"d\b",x) if i] #if you dont want null strings.

Try this.

Mazdak · Answer

As a better way you can use re.findall and use r'\b(\w+)d\b' as your regex to find the rest of word before d:

>>> re.findall(r'\b(\w+)d\b',s)
['dor', 'wor']

Wiktor Stribiżew · Answer

Since \w also captures digits and underscore, I would define a word consisting of just letters with a [a-zA-Z] character class:

print [x.group(1) for x in re.finditer(r"\b([a-zA-Z]+)d\b","dord word")]

See demo

twasbrillig · Answer

If you're wondering why your original approach didn't work,

re.split(r'\wd',"dord word")

It finds all instances of a letter/number/underscore before a "d" and splits on what it finds. So it did this:

do[rd] wo[rd]

and split on the strings in brackets, removing them.

Also note that this could split in the middle of words, so:

re.split(r'\wd', "said tendentious")

would split the second word in two.

Donate For Us