Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is greedy "or" group in regex exists?

Tags:

python

regex

I have an automatically generated regular expression, which basically is one big "or" group like so:

(\bthe\b|\bcat\b|\bin\b|\bhat\.\b|\bhat\b)

I've noticed that in case of

hat.

It would match "hat" only, not "hat." as I want. Is there a way to make it more greedy?

UPDATE: forgot about word boundaries, sorry for that.

like image 914
Andrew Avatar asked Apr 09 '12 03:04

Andrew


1 Answers

Put hat\. before hat in the regular expression. The first matching expression in an alternation wins. hat matches hat. so hat\. is never checked.

A better way would to just write that part as hat\.? rather than hat\.|hat. That makes the period optional so you don't need two terms in the alternation.

After your edit:

There is no word boundary between . and, say, a space (both are non-word characters). So \bhat\.\b is only going to match in things like hat.x where another letter immediately follows the period. This means that in e.g. a sentence, hat will be the one that gets matched. I see you found a solution, however.

like image 59
kindall Avatar answered Nov 07 '22 12:11

kindall