Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python RegEx using re.sub with multiple patterns

Tags:

python

regex

I'm trying to use Python RegEx re.sub to remove a colon before the antepenultimate vowel [aeiou] of a word if the antepenultimate vowel (from the end) is preceded by another vowel.

So the colon has to be between the 3rd and 4th vowel counting from the end of the word.

So the 1st example given would break down like this w4:32ny1h.

we:aanyoh > weaanyoh    # w4:32ny1h
hiru:atghigu > hiruatghigu
yo:ubeki > youbeki

Below is the RegEx statement I'm trying to use but I can't get it to work.

word = re.sub(ur"([aeiou]):([aeiou])(([^aeiou])*([aeiou])*([aeiou])([^aeiou])*([aeiou]))$", ur'\1\2\3\4', word)
like image 601
user2743 Avatar asked Oct 31 '22 15:10

user2743


2 Answers

Don't you just have too many parentheses (and other extra stuff)?:

word = re.sub(ur"([aeiou]):(([aeiou][^aeiou]*){3})$", ur'\1\2', word)
like image 161
Jeff Y Avatar answered Nov 08 '22 04:11

Jeff Y


Not sure if you want to completely ignore consonants; this regex will. Otherwise similar to Jeff's.

import re

tests = [
    'we:aanyoh',
    'hiru:atghigu',
    'yo:ubeki',
    'yo:ubekiki',
    'yo:ubek'
]

for word in tests:
    s = re.sub(r'([^aeiou]*[aeiou][^aeiou]*):((?:[^aeiou]*[aeiou]){3}[^aeiou]*)$', r'\1\2', word)
    print '{} > {}'.format(word, s)
like image 28
Tom Zych Avatar answered Nov 08 '22 04:11

Tom Zych