I want to split strings only by suffixes. For example, I would like to be able to split dord word
to [dor,wor]
.
I though that \wd
would search for words that end with d
. However this does not produce the expected results
import re
re.split(r'\wd',"dord word")
['do', ' wo', '']
How can I split by suffixes?
x='dord word'
import re
print re.split(r"d\b",x)
or
print [i for i in re.split(r"d\b",x) if i] #if you dont want null strings.
Try this.
As a better way you can use re.findall
and use r'\b(\w+)d\b'
as your regex to find the rest of word before d
:
>>> re.findall(r'\b(\w+)d\b',s)
['dor', 'wor']
Since \w
also captures digits and underscore, I would define a word consisting of just letters with a [a-zA-Z]
character class:
print [x.group(1) for x in re.finditer(r"\b([a-zA-Z]+)d\b","dord word")]
See demo
If you're wondering why your original approach didn't work,
re.split(r'\wd',"dord word")
It finds all instances of a letter/number/underscore before a "d" and splits on what it finds. So it did this:
do[rd] wo[rd]
and split on the strings in brackets, removing them.
Also note that this could split in the middle of words, so:
re.split(r'\wd', "said tendentious")
would split the second word in two.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With