Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for overlapping matches

Tags:

regex

php

For a linguistics project I am trying to match all occurrences of one or two consonants between vowels in some text. I am trying to write a very simple matcher in PHP (preg_match_all), but once the match is consumed, it cannot match again.

The following is very simple and should do the trick, but only matches the first occurrence:

[aeiou](qu|[bcdfghjklmnprstvwxyz]{1,2})[aeiou]

In: officiosior: offi and osi are returned, but not ici because the trailing i is the first part of the match in the second match.

As far as I can tell, it's impossible to do, but is there a decent way to work around the issue?

like image 934
Ryan Ward Avatar asked Dec 25 '22 07:12

Ryan Ward


1 Answers

You can use a Positive Lookahead assertion to achieve this.

(?=([aeiou](?:qu|[^aeiou]{1,2})[aeiou]))

A lookahead does not consume any characters on the string. After looking, the regular expression engine is back at the same position on the string from where it started looking. From there, it can start matching again...

Explanation:

(?=                    # look ahead to see if there is:
  (                    #   group and capture to \1:
    [aeiou]            #     any character of: 'a', 'e', 'i', 'o', 'u'
    (?:                #     group, but do not capture:
      qu               #       'qu'
     |                 #      OR
      [^aeiou]{1,2}    #       any character except: 'a', 'e', 'i', 'o', 'u' 
                       #       (between 1 and 2 times)
    )                  #     end of grouping
    [aeiou]            #     any character of: 'a', 'e', 'i', 'o', 'u'
  )                    #   end of \1
)                      # end of look-ahead

Working Demo

like image 131
hwnd Avatar answered Dec 27 '22 19:12

hwnd