Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split by suffix with Python regular expression

I want to split strings only by suffixes. For example, I would like to be able to split dord word to [dor,wor].

I though that \wd would search for words that end with d. However this does not produce the expected results

import re
re.split(r'\wd',"dord word")
['do', ' wo', '']

How can I split by suffixes?

like image 450
kilojoules Avatar asked Jul 12 '15 19:07

kilojoules


Video Answer


4 Answers

x='dord word'
import re
print re.split(r"d\b",x)

or

print [i for i in re.split(r"d\b",x) if i] #if you dont want null strings.

Try this.

like image 198
vks Avatar answered Sep 28 '22 20:09

vks


As a better way you can use re.findall and use r'\b(\w+)d\b' as your regex to find the rest of word before d:

>>> re.findall(r'\b(\w+)d\b',s)
['dor', 'wor']
like image 38
Mazdak Avatar answered Sep 28 '22 19:09

Mazdak


Since \w also captures digits and underscore, I would define a word consisting of just letters with a [a-zA-Z] character class:

print [x.group(1) for x in re.finditer(r"\b([a-zA-Z]+)d\b","dord word")]

See demo

like image 21
Wiktor Stribiżew Avatar answered Sep 28 '22 20:09

Wiktor Stribiżew


If you're wondering why your original approach didn't work,

re.split(r'\wd',"dord word")

It finds all instances of a letter/number/underscore before a "d" and splits on what it finds. So it did this:

do[rd] wo[rd]

and split on the strings in brackets, removing them.

Also note that this could split in the middle of words, so:

re.split(r'\wd', "said tendentious")

would split the second word in two.

like image 31
twasbrillig Avatar answered Sep 28 '22 19:09

twasbrillig