For a linguistics project I am trying to match all occurrences of one or two consonants between vowels in some text. I am trying to write a very simple matcher in PHP (preg_match_all
), but once the match is consumed, it cannot match again.
The following is very simple and should do the trick, but only matches the first occurrence:
[aeiou](qu|[bcdfghjklmnprstvwxyz]{1,2})[aeiou]
In: officiosior
: offi
and osi
are returned, but not ici
because the trailing i
is the first part of the match in the second match.
As far as I can tell, it's impossible to do, but is there a decent way to work around the issue?
You can use a Positive Lookahead assertion to achieve this.
(?=([aeiou](?:qu|[^aeiou]{1,2})[aeiou]))
A lookahead does not consume any characters on the string. After looking, the regular expression engine is back at the same position on the string from where it started looking. From there, it can start matching again...
Explanation:
(?= # look ahead to see if there is:
( # group and capture to \1:
[aeiou] # any character of: 'a', 'e', 'i', 'o', 'u'
(?: # group, but do not capture:
qu # 'qu'
| # OR
[^aeiou]{1,2} # any character except: 'a', 'e', 'i', 'o', 'u'
# (between 1 and 2 times)
) # end of grouping
[aeiou] # any character of: 'a', 'e', 'i', 'o', 'u'
) # end of \1
) # end of look-ahead
Working Demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With