Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

capturing words with optional prefiex

Tags:

regex

I need to extend an existing regex to catch also some optional prefix. My current regex is working fine:

(?:\b)(?:mon|tue|wed|thu|fri|sat|sun)(?:\b)

and matches any of these words separated by word boundaries. For instance, given the string "mon-sun.sat" it will match mon, sun and sat individually.

Now, say the words above can optionally appear prefixed by a term like "each" "only" "any", for instance "mon. any-tue or only-wed. sat. each weekend"

I want to extend my regex to match and capture (in the example above), the terms mon any tue only wed sat but clearly not each because does not prefix a term of the list. In practice the pattern to capture is: optional prefix followed by a day of the week.

I have tried extending my regex in several ways but with no success. I guess I'm messing up with the word boundaries.

In other words: There are two sets of words say P={each,only,any} and W={mon,tue,wed,thu,fri,sat,sun}. I need to match any element w in W optionally prefixed by an element p in P. The separators can be any \b.

EDIT: my current attempt is (:?\b) ((any|only|each)?(:?\b)) (:?mon|tue|wed|thu|fri|sat|sun) (:?\b) but will only match mon tue wed sat.

like image 493
giog Avatar asked Oct 18 '22 12:10

giog


1 Answers

You may use

\b(?:(any|only|each)\W+)?(mon|tue|wed|thu|fri|sat|sun)\b

See the regex demo

Details:

  • \b - a leading word boundary
  • (?:(any|only|each)\W+)? - an optional non-capturing group that matches 1 or 0 occurrences of:
    • (any|only|each) - a whole word (the leading word boundary has already been asserted with the \b above, and the trailing word boundary is assured with \W+) any, only each`
    • \W+ - 1 or more non-word chars.
  • (mon|tue|wed|thu|fri|sat|sun)\b - a whole word (due to the initial \b or \W+ and a \b after the capturing group): either mon, tue, wed, thu, fri, sat or sun.

Note that (?:...)? non-capturing group is used to wrap an optional subpattern since it does not create any memory buffer for the capture compared to a capturing group. ? is the quantifier making it match 1 or 0 occurrences of the subpattern sequence inside the group. \W is a non-word char shorthand character class that consumes any non-word char (so, any punctuation and symbols, and even whitespace will be matched).

like image 55
Wiktor Stribiżew Avatar answered Oct 21 '22 04:10

Wiktor Stribiżew