Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for a third-person verb

Tags:

python

regex

I'm trying to create a regex that matches a third person form of a verb created using the following rule:

If the verb ends in e not preceded by i,o,s,x,z,ch,sh, add s.

So I'm looking for a regex matching a word consisting of some letters, then not i,o,s,x,z,ch,sh, and then "es". I tried this:

\b\w*[^iosxz(sh)(ch)]es\b

According to regex101 it matches "likes", "hates" etc. However, it does not match "bathes", why doesn't it?

like image 806
maestromusica Avatar asked Nov 13 '16 09:11

maestromusica


People also ask

What does ?! Mean in regex?

It's a negative lookahead, which means that for the expression to match, the part within (?!...) must not match. In this case the regex matches http:// only when it is not followed by the current host name (roughly, see Thilo's comment).

What are regex patterns?

A regular expression is a pattern that the regular expression engine attempts to match in input text. A pattern consists of one or more character literals, operators, or constructs.


2 Answers

You may use

\b(?=\w*(?<![iosxz])(?<![cs]h)es\b)\w*

See the regex demo

Since Python re does not support variable length alternatives in a lookbehind, you need to split the conditions into two lookbehinds here.

Pattern details:

  • \b - a leading word boundary
  • (?=\w*(?<![iosxz])(?<![cs]h)es\b) - a positive lookahead requiring a sequence of:
    • \w* - 0+ word chars
    • (?<![iosxz]) - there must not be i, o, s, x, z chars right before the current location and...
    • (?<![cs]h) - no ch or sh right before the current location...
    • es - followed with es...
    • \b - at the end of the word
  • \w* - zero or more (maybe + is better here to match 1 or more) word chars.

See Python demo:

import re
r = re.compile(r'\b(?=\w*(?<![iosxz])(?<![cs]h)es\b)\w*')
s = 'it matches "likes", "hates" etc. However, it does not match "bathes", why doesn\'t it?'
print(re.findall(r, s))
like image 63
Wiktor Stribiżew Avatar answered Oct 14 '22 23:10

Wiktor Stribiżew


If you want to match strings that end with e and are not preceded by i,o,s,x,z,ch,sh, you should use:

(?<!i|o|s|x|z|ch|sh)e

Your regex [^iosxz(sh)(ch)] consists of character group, the ^ simply negates, and the rest will be exactly matched, so it's equivalent to:

[^io)sxz(c]

which actually means: "match anything that's not one of "io)sxz(c".

like image 32
Maroun Avatar answered Oct 14 '22 23:10

Maroun